








EDITORIAL STAFF 


EpItor 
WILLIAM KRUSKAL 
AssociaTE EpitTors 
ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 
Z. W. BIRNBAUM WASSILY HOEFFDING E. L. LEHMANN 
N. L. JOHNSON 


WITH THE COOPERATION OF 


R. Bum Cyrus DERMAN Sotomon KuLLBack 
. C. Boss J. L. Doos Eugene Luxkacs 

. L. BuRKHOLDER Meyer Dwass G. E. Noreruer 

. G. CHAPMAN D. A. 8. Fraser Howarp Ralrra 

. 8. ConNoR SAMUEL KARLIN H. E. Rossins 

. R. Cox Harry Kesten Water L. Suita 

F. Day C. H. Krart Lione. WEIss 


Past EpitTors oF THE ANNALS 


H. C. Carver, 1930-1938 T. W. ANpERson, 1950-1952 
8.8. 


WILks, 1938-1949 E. L. Leamann, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


IMS INSTITUTIONAL MEMBERS 


ABERDEEN PROVING GrouNDs (BALLISTIC RESEARCH LABORATORIES), Aberdeen, Maryland 


AMERICAN ViscosE CorporaTIon, Marcus Hook, Pennsylvania 

Bei TELEPHONE LABORATORIES, INc., TECHNICAL LIBRARY, 463 West Street, New York 14, 
New York 

CorRNELL UNIVERSITY, MATHEMATICS DEPARTMENT, Ithaca, New York 

GENERAL ANALYSIS CORPORATION, 11753 Wilshire Blvd., Los Angeles 25, California 

Inp1ANA UNIVERSITY, CoMMITTEE ON Sratistics Bloomington, Indiana 

INTERNATIONAL Business Macuines Corporation, New York 

lowa State Co.Luece, Statistica, Lasoratory, Ames, Iowa 

LocKHEED AIRCRAFT CoRPORATION, Burbank, California 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY, HAYDEN LIBRARY, PERIODICAL DEPARTMENT, 
Cambridge 39, Massachusetts 

MicuHiGcaNn State UNiversity, DEPARTMENT oF Statistics, East Lansing, Michigan 

NaTIONAL Security AcEency, Fort George G. Meade, Maryland 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Statistics, Princeton, New Jersey 

Purpve University, Lafayette, Indiana 

Space TecuNno.tocy LaBorarTorigs, P. O. Box 95001, Los Angeles 45, California 

Stanrorp UNiversity, Statistics DEPARTMENT, Stanford, California 

State University or Iowa, Iowa City, Iowa 

Tue CaTuouic UNIVERSITY OF AMERICA, STATISTICAL LABORATORY, DEPARTMENT OF MATH- 
EMATICcS, Washington, D. C. 

THe RamMo-Woo.pripGE Corporation, Los Angeles, California 

UNIVERSITY OF CALIFORNIA, STATISTICAL LaBoaaTorY, Berkeley, California 

UNIversity oF Iturnors, Urbana, Illinois 

University or Norta Caro.ina, DEPARTMENT oF Statistics, Chapel Hill, North Carolina 

UNIVERSITY OF PuERTO Rico, ScHoo. or TropicaL MepicingE, San Juan, Puerto Rico 

University OF WasHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washingtom 





A cumulative index for Volumes 1 through 30 of the Annals of Mathematical 
Statistics is anticipated. A committee of the Institute, under the chairmanship 
of I. R. Savage, is working out detailed plans for the construction of the index. 
Two earlier cumulative indexes covered Volumes 1-10 and 1-20 respectively. 

The index will probably list correction notes, acknowledgments of priority, 
and the like along with the papers to which they pertain. Authors of papers who 
have correction notes or similar materials are urged to send them to the Editor 
as soon as possible, so that they may appear in the December, 1959, issue, and 
hence in the cumulative index. 





ESTIMATING THE PARAMETERS OF A DIFFERENTIAL PROCESS' 


By Herman RUBIN AND Howarp G. TucKER 
University of Oregon and University of California, Riverside 


1. Introduction and summary. Let X denote a differential process, i.e., a 
stochastic process with independent increments for which the distribution of 
X(t + h) — X(t) depends only on h. The parameter ¢ runs through the interval 
0, + ), and the usual initial condition P{X(0) = 0] = 1 is assumed. Then it 
is known that the distribution of X(t) is infinitely divisible, i.e., the logarithm 
of its characteristic function can be written as 


. TRL nas iux \1+2° 

(1.1) log fxry(u) = tytu+t i (c -—-li- rt.) - dG(2). 
In this canonical representation, y is a real constant, and G is a bounded, non- 
decreasing function, it being permissible always to consider G(— « ) = 0. In the 
usual probabilistic terminology, the probability law of X(t) is a convolution of 
a normal law and a (possibly infinite) number of Poisson laws, or a limit of such 
laws. The function G is called the jump function; its saltus at « = 0, 0 = 
G(+0) — G(—0O), is the variance of the normal component, and its set of points 
of increase for x # 0 gives information as to the nature of the Poisson com- 
ponents, viz., the “relative density” of the magnitudes of the discontinuities of 
the sample function. The purpose of this paper is to derive estimates for this 
jump function G and for the “trend term’’, y. Two estimates of G are obtained, 
and one estimate is obtained for y. 

In the first method of estimating G, considered in §2, it is assumed that the 
experimenter can observe a sample function of X at any finite number of values 
of ¢t that he chooses. Accordingly, for any integer n, let 


co=# (=x) 


Then the estimate Gy.,,.(u) of G(x) is defined as 


1 nN b @: : 
‘ * nk 
(1.2) Gwn(u) = N d 1+ X?, Tixi. <u)» 


] i 1 j X nk > u 
[Xnke Su) a 0 if Aw > u, 


and N = [Tn], T being the largest value of i observed. It is proved that this 
estimate is strongly consistent in the following sense: 


where 


Received January 20, 1958; revised March 16, 1959. 

1 This research was supported by the Office of Ordnance Research under contract No. 
DA-04-200-ORD-539. Reproduction in whole or in part is permitted for purposes of the 
United States Government. 


641 





642 HERMAN RUBIN AND HOWARD G. TUCKER 


(1.3) P{lim lim Gy,.(u) = G(u) for all ue C(G)} = 1, 
n~2o N-~o 

where C(G@) denotes the set of all values of u at which G is continuous. This 
estimate is not necessarily an unbiased estimate of G(u) for all ue C(G). 

The second method for estimating G, developed in §3, requires that the ex- 
perimenter be able to observe all the discontinuities of a sample function of X 
on a finite interval in addition to being able to observe X(t) at any finite set of 
values of t. Not only is a consistent estimate obtained for G, but also a consistent 
estimate is obtained for the variance of the normal component, « = G(+0) — 
G(—0). Let {k,} denote a sequence of positive integers such that 


(1.4) Yaka’ < @. 

Further, let 7’) be a fixed value of time 7’, let 
kn 

(1.5) Si = Do (X(kTo/kn) — X((k — 1)To/ka)}’, 
k=l 

and 


(1.6) D* = the sum of the squares of the jumps of the sample function during 
the time interval [0, To]. 


Finally, let 
(17) ya's Fr _b an, () 
, : Lata? 


where, for every Borel set B, Nr(B) denotes the number of discontinuities ob- 
served for X during [0, 7] whose magnitudes lie in B. The estimate G,.r(u) of 
G(u) is then constructed as follows: 


_ fT J(u) if u<0 
_ G..r(u) = ee +7 (Si —D*) if u>0. 


The estimate G,.r(u) is an unbiased estimate of G(u) if o = O or if u < 0, but 
in any case, G,,r(b) — G,,r(a) is an unbiased estimate of G(b) — G(a), pro- 
vided 0 z [a, b|. Also, this estimate is consistent in the following sense: 


(1.9) lim G,.r(u) = G(x) 

T+2 

n~>e 
with probability one for every u. In addition, (1/7) [S, — D’] is a consistent 
estimate of « = G(+0) — G(—0), the variance of the normal component, in 


the sense that lim,..(1/7») [S, — D’| = o° with probability 1. 

In §4 a comparison is made of the two estimates for G(x) obtained in §§2 and 
3. It is found that both estimates do agree in a special limiting case. In particular 
it is proved that 
(1.10) p lim Gy,,(u) = lim G,.y(u) forall we C(@,.n). 


ne ne 





PARAMETERS OF A DIFFERENTIAL PROCESS 643 


Finally in §5 a consistent and unbiased estimate is derived from the ‘trend 
term’’, y. This estimate is 


+ 3 
ae 


(1.11) 4(t) = Hx(o = [. 


oe 
iss dN.(b)? : 


It is consistent in the sense that 


(1.12) P{ lim 4(t) = y} = 1. 

two 
Another way of writing this estimate is as follows. Let J;, J2,---,Jn,-°+- de- 
note the discontinuities of the sample curve up to time ¢ (not necessarily in 
order); then 


-yi.| 
-it+Jy 
2. The first method for estimating G. This method of estimating G is based 
entirely on necessary conditions for one of the most general central limit theorems. 
The statement of the theorem is found on page 121 of Gnedenko and Kolmogorov 
{2], which we restate as follows: 
In order that for some suitably chosen constants A;, Az, ---, An, -:: the 
distribution functions of sums 


y 1) = Xa + Xnz + Sen + X nikn a An 


of independent infinitesimal random variables converge to a limiting distribution 
function, it is necessary and sufficient that there exist a bounded, non-decreasing 
function G such that 

kn u 


=" a 
2 [ i+2 dF u(X + an) >G(u) as n> @ 


at all points u at which G is continuous, i.e., at all ue C(G). 

In this theorem, F(z) is the distribution function of X,., and ax = 
fi, 2dF.(x) for arbitrary r > 0. One result of the theorem is that A, can be 
selected as An = Gn + Qnz + +++ + ans, . The limiting distribution referred to 
in the theorem is necessarily infinitely divisible, and the logarithm of its charac- 
teristic function is of the form 


a) ° ‘ = 
(2.3) log f(u) = tyu + [ (cm ee .) , = * dG(2). 
: z= 1+ 2° = 
The function G in (2.3) is the same function G in (2.2). The couple (y, G) de- 
termine and are determined by this limiting distribution. In case the limiting 
distribution has finite variance, then it is known that there exists a real constant 
a and a bounded non-decreasing function H(x) such that 


(2.4) log f(u) = tau + [. (e“* — 1 — iuzr) 5 dH (x). 





644 HERMAN RUBIN AND HOWARD G. TUCKER 


In this case the limiting distribution determines and is determined by the couple 
(a, H). It is easily verified that the limiting distribution has finite variance if 
and only if 


+00 


| (1 + 2”) dG(z) < ~. 


In this case the relation between G and H becomes 


(2.6) Bb isy <an (1 + 2°) dG(z). 
Also 
(27) — | xz dG(z). 


In the case of the differential process X(t), let 


(2.8) ys =< (‘) oon (“ = ‘), ae i a 
n n 


Letting n = 1,2, --- , we see that we have an infinitesimal system of independent 
random variables for which 


Xn + Xue + -°+ + Xan = X(1) 


for every n, and hence the distribution of X(1) is the limit law of the distribu- 
tions of this sequence of sums. In this case, F',.() is for fixed n the same for all 
k and is denoted by F,,(x); likewise, a, = a, x for 1 < k < n. Then by (2.2), 


u 


(2.9) G,(u) = n|[ i = -dF,(x% + an) ~9G(u) as nox 
— © =z 


for all u e C(G@). The only deterrent to establishing an estimate for G(u) is the 
presence of a, in (2.9). The problem then remains to eliminate the need for a, ; 
i.e., letting 


(2.10) G,(u) = n | iz = dF 4 (2), 


the problem is to show that 


G,(u) > G(u) asn— ~ 
for all u e C(G). Accordingly, let 
u Se cake \2 ; 
G**(u) = nf . Poa a) . aF (2). 
ol + (t — an)? 
We now prove 


(2.11) Lemma: G**(u) > G(u) asn— @ for all ue C(G). 





PARAMETERS OF A DIFFERENTIAL PROCESS 645 
Indeed, for fixed u ¢ C(@) and arbitrary « > 0 there exists a 6 > O and an 
integer N such that 
utbeC(G), 
G(u + 6) — G(u — 6) < «/2, 
|{G,(u + 8) — Ga(u — 5)} — [G@(u + 6) — G(u — 6)} | < €/2, 
and | a, | < 6 for alln > N. Hence for all n > N 


“ten (2 — aa)” ; . : 
ne | eee dF,(x)| = |G,(u + an) — Ga(u) | 


< | {G,(u + 6) — G,(u — &)} — {G(u + 6) — G(u — 6)} | 
+ |G(u + 6) — G(u — 6)| <«, 


which proves the lemma. 
We now prove: 
(2.12) Lemma: If u < 0 and if ue C(G), then G,(u) > G(u) asn > @. 
In order to prove this we consider the function 
f(z) = x , 1+(se- an E 
i+-+z” @~=a,) 
(For all sufficiently large n, f,(2) is finite for all z S u.) Then 


u 


G,(u) = fn(a) dG** (x), 


and 


|G(u) — G(u)| s ! f.(x) dG2*(x) — dG** (x) 


i— & 


4 I dG** (x) — Gu) | 


< sup |f.(xz) — 1|GR*(u) + | GR*(u) — G(u)| 


zsu 


which converges to zero as n — © because of Lemma (2.11) and the fact that 
f,(x) converges uniformly to 1 over any closed set not containing zero. 
In precisely the same way one can prove 


(2.13) Lemma: If0 < a < band if a,b ¢ C(G), thenG,(b) — G,(a) + G(b) — 


G(a) asn— ~, 
With Lemmas (2.12) and (2.13) we can now prove the following: 


(2.14) Turorem: Jf ue C(G), then G,(u) > G(u). 


ProoF: Because of Lemma (2.12) we need only prove this in the case where 
u > 0. From the two inequalities 





646 HERMAN RUBIN AND HOWARD G. TUCKER 


T 


1 2 2 2 
scdaceaasaalaaaiiaeiaiaiapint Seema - " a pi e <= ’ a= 
ar = “An r x dF,(2) — 2na’, + na,(F,(r) — F,( “yh 


] : 2 : 
= eee (x — a,) dF, 
iz matey L zr —a,) dF,(z) 
(2.15) 9 
? £ are. i n . , : 2 7 
Sn} j oe. dF, (x) Sn [ (x — a,) dF, (x) 
< n| x’ dF (x) — ne 
and 
nf 2 dF,(z) S (A+ r)n / =. - dF (2) 
(2.16) sai wits ; 


IIA 


(1 + r’)n | x dF,(2x) 


one obtains 

at r T 
n| ——.dF,(z) $ nf x dF,,(2) 
—T l + a Lf? 


T 


\t = Gn) GF .(2) 
T ] + (x — Qn)” 


+ {2 — F,(r) + F,(—r) }na’. 
+ (r+ ja, |)*}n f 2’ dF,(2) 


LF 


;\2 r 
< {1 + (+ + [an |) \n | 


tw 
_ 
~I 
~~ 
IA 
— 


+ i] i (r + | On \)? = F,,(7) + F,,(—r) }nai, 
2 | 1\2 Pr ae 7 
(1+ A+ (+ an[)'n [a aFa(e) 


+ (1+ 7){1 — (¢ + lanl)? — Pralr) + Pa(—r)jnai. 


lA 


Now, in the particular situation of this differential process, it is easily seen that 
the sequence of constants {A,} must necessarily be convergent, and hence A, = 
na, are bounded. This in turn implies that a, — 0 as n — ~ and consequently 


(2.18) no, > 0 as n— @, 


From the inequality (2.17) and for +r always selected such that 7, —7 ¢ C(@), 
one easily obtains 


lim n [ 1 ; zr dF, (x) S$ (1+ 7){G(r) - reer 
(+P lim n | pap aul). 


IA 


Now let 0 < 7+ < u. Then 


u x 1 
im — (x1) =G — G( -_—— {G(r) — G(—7)} G(—7), 
lim n : i+ dF (x) = G(u) r) + —— r) (—r)} + G(—7) 


n 





PARAMETERS OF A DIFFERENTIAL PROCESS 


and taking the limit of both sides as r — 0, one obtains 


u 2 
; ¥ ‘ 
lim n -, ite dF,,(x) = G(u). 


In a similar fashion, 


u 2 
— x 
lim n [ dF, 
im | iss (x) s 
and the theorem is proved. 
Theorem (2.14) asserts that 
be ’ 
a5 Lixase} — G(u) 


k 


asn— * for every u ¢ C(G), or 


A X*, ' 
G,, ( “u)= n& {i eee lixy, cu} a> G( u) 
asn— ~ for all ue C(G). By the strong law of large numbers, 
nN = 
nk 


(2.19) Gyin(u) = W& is xX, Tixasuy > Gn(u) 
as N — for every value of u with probability one, for every fixed n. Since 
G(u) and Gy n(u) are nondecreasing in u, one can then write 
(2.20) P{lim lim Gy,n(u) = G(u) forall weC(G)} = 1. 
noo N~o 

The estimate to be used is Gy,.(u) as defined in (2.19). It is strongly consistent 
in the sense given by (2.20). This estimate, however, is not necessarily an un- 
biased estimate of G(u) for all values of u ¢ C(G@). This is the case when X(t) is 
a pure Poisson process, i.e., 


log fxc»(u) = iuat + At(e™ — 1), 
where, say, a > 0 and b > O. In this case 


0 if 2<b 
eta oy. +B)" if e>b. 


If wu < 0, then for every n, N, it is easily checked that &(Gy.n(u)) = 0. However, 
if 0 < x < b, then for those values of n for which 0 < a/n < z, one obtains 


- na’ , 
&(Gy.,(z)) = + a > 0, for all N, 


and if x = b, then 


f 
&(Gr.,(2)) = Ne va 


2 2 [=] e~2 
a i (a + nb) M(i+> > (A/n) 


n+ a n? + (a + nb)? n 


n t=2 a! 


(a + nb)’ ) 
n? + (a + nib)? 
which is clearly greater than \b’(1 + b°)™" 





648 HERMAN RUBIN AND HOWARD G. TUCKER 


3. The second method of estimating G. Because of the fact that X is a differ- 
ential process, once the probability distribution is given for, say, X(1), then it 
is known for X(t) for every t. Furthermore, for every finite set of values of t, say, 
t , to, «++ , tm, the joint distribution of the random variables X(t,), X(tz), --- , 
X(t») can be derived from the distribution of X(1). From a probabilistic point 
of view, two stochastic processes are the same if their corresponding finite di- 
mensional marginal distributions agree. Accordingly, it is found convenient to 
construct and prove properties of an estimate of G by constructing a process 
equivalent to X which, because of the equivalence, shall be labeled X. 

Using the y and G(z) in (1.1), a stochastic process in two variables, Y(b, t), is 
considered, where b ¢(— #, +2) andte[0, ~]. It is assumed that Y(b, t) is 
a process with independent “generalized increments” on the (b, t)-half plane 
over which it is defined, i.e., if {A,:u ¢ M} is a disjoint family of Borel sets on 
this plane, then the random variables 


f 
< / Y (db, dt), ue au 
Vda f 
are independent. The probability distribution of Y(b, ¢) is assumed to have a 
characteristic function fy@,.(u) for which 


log fra.n(u) = ituy(G(«) — o°) PP pe dG(x) 


s - tux l+2 Ss 
+t fre (e* — 1 — ——,) —— dG@(z), 
I~ i¢z/ 2 
where o = G(+0) — G(—O) is actually the variance of the normal component 
of X(1). (In ease G(+  ) = o’, then the results stated in this section concerning 
the estimates of G(x) hold trivially without using Y(b, ¢).) Then for every ¢ we 

have 


(3.1) 


(3.2) X(t) - | Y (db, t), 


the integral existing with probability one since it converges in distribution. From 
the process Y(b, t) we construct a sequence of independent stochastic processes, 
each one in this sequence being a process with independent increments. Let 


(3.3) Yo(t) = Y(+0,t) — Y(—0, t), 


i.e., Yo(t) is a normal process with mean zero and variance to’, and let, for fixed 
positive integer n, 


| r(’ .t) — ¥(+0,t) if k=1 
(34) Y,.(t) =< x 


r(* .t) i r(* zs .t) fo FeO ~1, 22 23 --- 
rt n n 


Then for every ¢ 





PARAMETERS OF A DIFFERENTIAL PROCESS 


+00 
(3.5) X(t) = Yo(t) + DY Y(t) 
ku—o 
with probability one, since this series of independent random variables does 
converge in distribution. 

The most difficult problem that occurs in this section is to find an estimate for 
the saltus of G(x) at x = 0, i.e., to find an estimate of the variance of the normal 
component, Y(t), of the process. Let k; , ke, +--+, kn, +++ denote a sequence of 
integers such that 
(3.6) kz! <'@. 


n 


Also let 


X(t) — Yo(t), or 


. +2 
(3.7) , +2 ¥(db,t) = > 


k=a—w 


and let 


kn 
Qi = >> {X(i/kw) — X((j — 1)/kw)}? 
j=l 


3.8) 


AN ‘ 
-- {Y(j/kw) — Y((j — L)kw)t. 
j=1 
We can now prove 
(3.9) Lemma: P{Qy > o' as N — ©} = 1. 


Proor: From (3.7) and (3.8), 


kn 


Qs = 7 {Yo(j/kw) — Yo((j — 1)/kw)}? + 2Zy , where 


j=l 


kw +20 
(3.10) Zw = >. {(Yo(j/kw) — Yo((j — 1)/kw)) DO (Yuulj/kw) 
j=l km—o 
“=~ Y nx( (j _- 1 )/kw) ) . 
Since Qi — 2Zy has mean o° and variance 20‘/ky , it converges to o with 
probability one. It remains to prove that Zy + 0as N — ~ with probability one. 
It remains to prove that Zy — 0 as N — & with probability one. Toward this 
end, let, for every fixed k, 
KN 
Zn. = > {Vo(j/kw) ~~ Yo(( a \/kw)} 
o.11) j=l 
x ivaals ky) — Yuxl (7 — 1)/kw)}. 
The expectation of each summand is zero, since it is the product of two inde- 
pendent random variables of finite expectation and &Yo(t) = 0. Hence 





650 HERMAN RUBIN AND HOWARD G. TUCKER 


(3.12) EZyvx = 0. 
We may write Zy, = > 5%, U;V;, where U; = Yo(j/kw) — Yo((j — 1)/kw), 
and V; = Yu(j/kw) — Yur((g — 1)/kw). Then c one easily obtains Var (Zy.) = 
(1/kw)o" > 5%, &V5 = o°&(Vi). But 
° 1 kin 
avi) = 24 f 1 + 2”) dG(x) + E (aun + 


kin 


k—1l)/n (k-1)/n 


Zz dG(z) )? \ ’ 
where 


G(k/n) — Gk — ~ 1)/n) 


akn = OE coe 


G(+ «) —¢ 





Hence for fixed k 
(3.13) Var (Zy.x) = (o°/ky)(R + S/ky), 


where R and S are both finite and do not depend on N. Hence, for arbitrary 
¢ > O and because of (3.6), (3.13) and Chebishev’s inequality, 


(3.14) > P(|2Zrr-Ol|>e 5 1 Var (Zyx) < ®, 
N=l ° Nel 


which in turn implies (by applying the Borel-Cantelli lemma) that 
(3.14a) ZN. — 0 as N — & 


with probability one for ever~’ fixed k. 
We now show that Zy = > -¢2... Zw. converges with probability one to zero 
as N — o. From (3.1) and (3.4), we have, for k # 0, k ¥ 1, 


ag sai ius 1+2°,, 
(3.15) log fy... (u) = tutBrn +t — 1) a —dG(x), 
(k—1)/n i 
where 
G( — G( _ As : 
Rican ¥ I(k/ n) —G (k A); n)) + [ 1 dG(x). 
G(x) — (k-1)/n & 


(3.16) Let Wir(t) = Y x(t) —— (Bn . 

Then since 

(3.17) B= > |Bu!s pS), 4. Gla) <0, 
k0,1 G( 0) — o 


the series of independent random variables 


> Wut) 


k#0,1 


converges in law and hence converges with probability one. This implies that 
W(t) | 2 ¢ for only a finite number of values of k with probability one for 
arbitrary « > 0, in particular for « = 1/n. Also note that if | W,.(t) | > 0, then 





PARAMETERS OF A DIFFERENTIAL PROCESS 651 


| Wu(t) | 2 (k — 1)/n. Now let A; denote the event that 7 is the largest value 
of |k| = 2 such that W,.(t) # 0, i.e., | War(t) | 2 1/n, and let A; denote the 
event that W,.(t) = 0 for all | k | = 2. Because of the fact that | W,.(t) | 2 1/n 
for only a finite number of values of k with probability one, we obtain 


(3.17a) > PA; = P(U A.) = 1, 
i=l t=1 

Now we may write 

(3.18) Zy = Sy+ Tx + Zya t+ Zo, 

where 


Sy = (Yo(j/kw) — Yo((j — 1)/kw)) 2 (Wa(j/kw) — Wae((j — 1)/kw))}, 
0,1 


j= 


Af 1 ; 
Ty = x ke ( > Buz) (Yo(j/kw) pag Yo((j = 1)/kw))>. 
j=l UN k#0,1 ) 
Because of (3.17) and the fact that &7Ty = 0, 
~ 2 
> PtiTl2ds2 Dk < », 
N=l oe 


€ 


and consequently Ty — 0 as N — « with probability one. By (3.14a), Zw. + 
Zxyo—0as N — & with probability one. Now 


M «“ 
Sw = > Sx Ta; ft Sy 1 ( U Ai), 


t=2 i=—M+1 


where, as before, Js is the indicator of S. By (3.14a), P{Swl4,;-~OasN — ~} = 1 
for every integer i. Hence Zy > 0 as N — © over the set Uit, A; for every M 
except for a set of measure zero, and hence we can conclude by (3.17a) that 
P{Zy —~0as N — ~} = 1. Thus the lemma is proved. 

Unfortunately, Qs is not an observable random variable. In the definition of 
Q% in (3.8), the part that is not observable is 


knw 
Ly = > {¥(j/kw) — Y((j — 1)/kw)}?. 
j=l 


We show now that Li converges with probability one to a bona fide random 
variable D*, which is the sum of the squares of the “jumps”, i.e.,Ly — D’ = 
fi (¥(dt))*. From a practical standpoint D* can be considered as “observable” 
while LX is observable only if ¢° = 0. Thus we prove 


(3.19) Lemma: Ly — D’ as N > ~ and D’ is finite, with probability one. 
Proor: Let us define 


(3.20) 





652 HERMAN RUBIN AND HOWARD G. TUCKER 


Then from (3.6) and (3.20) we obtain 
(3.21) > Mi<e@ and > kM; < @. 


Then for every positive integer n we define two independent differential proc- 
esses, U,,(¢) and V,(t), for which the logarithms of their characteristic functions 
are 


\ 


( —M », +00 2 
log fv,cy(w) = 4 / + / (e"* — 1) : T dG(2) b, 
| J 20 +My x 


and 
. + tux tux l xr . 
log fu, cy () = tuly, + t$ts* (« —l|— = :) + dG(x), 
x? x 
where 
rr 4 \ 
m=7-4\/[ + [ ~dG(z) >. 
\ -26 M, & } 
Clearly one can (equivalently) write Y(t) = U,(t) + V(t) for every n, and 


V,(t) has sample functions which are step functions, the jumps of which in 
absolute value are not less than M, . Now let 


k 


N 
R, = - {Un(j/kn) — UAC - 1)/kn)}", 
j=l 
kw 
(3.22) S, = {Un(g/kn) — Un((G — 1)/kn)}Vn(g/hn) — Val (gj — 1)/kn)! 
j=l 
kn 
TT, = {Vn(j/kn) — Valj — 1)/ka)}?. 
j=l 
Then 
(3.23) Li, = R, + 28, + T,. 


For e > 0, we define 


(G(x) if «Se 


G(x) =<4G(—e) if |r| <e 


\G(z) — Ge) + G(-e) if x 


IV 


€. 


The proof of the lemma will be accomplished by proving that the following three 
statements are true with probability one: 

i) for sufficiently large values of n, T,, is the sum of the squares of the jumps 
during {0, 1] which are in absolute value 2M,,. 

ii) R, ~Oasn— ~, and 

iii) lim, 7’, is finite. 
Once we prove i), ii), and iii), the lemma will follow easily. For by i) and (3.20), 
we have that 7, ~ D’ = fi (Y(dt))* as n > © with probability one. By the 





PARAMETERS OF A DIFFERENTIAL PROCESS 653 


Cauchy inequality, S% < R,T,, and thus because of ii) and iii), we have that 
S,—Oasn—- @ with probability one. Hence the lemma would easily follow. 
We first prove (ii). Let g(u) denote the characteristic function of 


U sn - U,.(j/kn) = UG an 1)/k,n). 
Then 


( 
log g(u) = tuk," \ + Smasizi : ao(2)} 


Li ile : l+2° 
kz Me jue tux ) a 
Fit, (. 1+ 2 * 


The first four semi-invariants of U;, are 

m = ike fy + S \sj2u,2 dG(x) + fu, x dG(x)} 

xo = — ky’ £Mp.(1 +2”) dG(z) , 

xg = — ik,’ f£™u, 2(1 + 2”) dG(x), and 

xa = ky’ £™p 2°(1 + 2”) dG(z). 
Since Var (Uin) = x4 + 2nd + Anns + Anime , we obtain 

Var R, = fu, a'(1 + 2°) dG(x) + (1/kn) Ba, 

where the sequence of B,’s is bounded by a finite number, say B. Thus 


(3.24) 20 Var Ra S$ {G(Mi) — G(—Mi)\( Mi + 2M) + BY ke < = 


because of (3.21). Consequently (3.24) implies that 
P{R, — &,—-O0asn— ~} = 
But 
DR, = fom, (1 + 2°) dG(x) + ka {fy + 2y Sisizu, © dG(z) 
+ [Sisizu, 2 dG(x))? + 2y f77, x dG(x) 
+2 Sisizu, 2 dG(x) F-%, x dG(x) + [F%0, x dG(x)?} 


By (3.21), 6&2, ~Oasn— &. 

Hence P{R, — 0 as n— ©} = 1. Thus ii) is proved. We now prove i). Let A, 
denote the event that at least 2 jumps of size (in absolute value) = M,, occur in 
at least one of the k,, subintervals of length k;'. Then 


P(A,) = 1— {P[X, s 1)}"", 
where X,, denotes the number of jumps of size (in —— value) 2 M,, during 


a specific time interval of length k;'. Now it is known (e.g., Doob (I], page 423 ) 
that X, has a Poisson distribution with Denar 


An = tf ; — ae (2). 





654 HERMAN RUBIN AND HOWARD G. TUCKER 


We want to prove that >>, PA, < . We first note that 


11+ M; 

‘yr * 

where K = G( +a) — G(—~); for simplicity, let k = 1. Now >>, P(A,) = 
+ a (1 — {e*(1 + An)" : is a convergent series if and only if the infinite 
product IIy = [[e-: { fe "(1 + X,)}*" is convergent (i.e., does not diverge to 
zero). Note that each a in the product does not exceed 1. Easily, Ip converges 
if and only if P = >>, log {e "(1 + X,)}"* is absolutely convergent. But 
P = Dai {—kidn + kn log (1 + A,)}. But this is equal to 


> {he An 7 kn (.. FP 2) 5 a k, : 


Do ka! M;*(1 + 2M? + Mt) < « 

by (3.21). Thus es P(A,) < «#. Hence by the Borel-Cantelli lemma, the 
probability that infinitely many of the A,’s occur is zero. Thus i) is proved. In 
conclusion, we prove iii). To do this, we simply note that, for every n, V,,(t) has 
only a finite number of jumps with probability one. (The number of these jumps 
follows a Poisson distribution with parameter \,,). But also, U,(t) is a differen- 
tial process with finite variance and consequently the expectation of the sum of 
the squares of the jumps is finite. This proves iii). Thus the lemma is proved. 

Having obtained lemmas (3.9) and (3.19), we obtain the following theorem 
without additional proof : 

TueoreM: Qy = ‘Ny {X(j/kw) — X((j — 1)/kw)}? — D? converges with 
probability one to o° as N > ~. 

Let N.(S) denote the number of jumps of the sample function during [0, ¢) of 
size in S. Then set 


(3.24a) 0 


lA 


An 


lA 


lA 


(3.25) Jz) = Fre j —— g AN: (b 


Now consider 
(3.26) Ji(x) — Jy) — F/ dG(b) = A,(z, y) 
If 0 zg (y, x], N(y, 2] is finite with probability one and 
EN Ay, x] = Si+((1 + b°)/b"| dG(b). 
Then ({1], p. 437), H.(z, y) is a martingale in both x and —y. Since 
E(| H(z, y) |) S £2 dG(b), 


it follows that the definition of H, can be extended to 0 € (y, x) and for y con- 
verging to —«. Thus J,(x) is well-defined and an unbiased estimate of 
F-2 AG(b) for all x. Hence we obtain the results in Section 1. 





PARAMETERS OF A DIFFERENTIAL PROCESS 655 


4. Comparison of the two estimates of G. The two estimates obtained for G 
are not necessarily equal since the first estimate obtained can be biased, while 
the second estimate obtained is unbiased over all intervals which do not contain 
zero. It does happen that at an intermediate limiting case the two estimates are 
equal with probability one at all continuity points of the second estimates, and 
this section is devoted to the proof of this fact. In particular, we prove that 
(4.1) p lim Gy,.(u) = lim G,.~(u) forall ue C(G,.~) 


n~2 n~>oa 


This can be expressed as follows 


u r 
[f —dN(z) if u<0 


#3 


: ie = 
(4.2) p, lim GT, (a a =r —, dN, (x) +f (dX(t))* — | x’ dN,(x) 
| de a 


if z>0O 


for every u ¢ C(G,..). We shall prove the result in the form expressed in (4.2). 

Let Y.(t) denote the process formed by the jumps exceeding ¢€ in absolute 
value, Y> the normal component and Y‘(t) the remaining process with the 
trend term included. As X,, = X(k/n) — X((k — 1)/n), we similarly define 
Yen, Yar, and You . Now 


en 2 


fats 1 
(4.3) nP (| Yonk | ea) = = / > 


and hence approaches 0. Also 
< Var va) 
= (@ — B/n)®’ 


8 = E(\ Y%.(1) |). Thus if @, approaches 0 sufficiently slowly as «, approaches 0, 
these both become small and hence 


(4.4) nP(| Yur! 2 = a) 


(4.5) P (for some k, | Xnzw — Yenk | 2 Zan) — 0. 


Also if «, approaches 0 sufficiently slowly, we have already seen that the proba- 
bility that there are at least two jumps of size larger than ¢, in any interval of 
length 1/n approaches 0. Now let u < 0 not be the size of a jump. Then for n 
sufficiently large, there is no jump whose size lies in (u — 2a, , u + 2a,). Thus 
with large probability, 


2 


Xn 
(4.6) Ca) @ Be oe 


1 + x?,’ 
where the sum is over those k’s for which there is a jump of value less than 
uw in [(k — 1)/n, k/n}, and there is a one-to-one correspondence between such 
jumps and intervals. Let v,, be the size of the jump corresponding to Xx , i.e., 
tne = Xnz — Yonk — Yenr . Observing that 





656 HERMAN RUBIN AND HOWARD G. TUCKER 


Fat eer 
l+2 14+7)|*> a 
with large probability Gf,(u) differs from f“« (b’/(1 + b°)) dNi(b) by less 
than a,N(—«, u) and hence the first part of (4.2) is proved. Similarly, the 
second part of (4.2) holds except possibility for an additive constant. To evaluate 
this constant we consider Gt,,(u) — G?,(—u) where both u and —u are con- 
tinuity points of N. Then with large probability, 
Xue 
"1+ Xi,’ 
where the sum is over those k’s for which there is no jump of absolute value 
exceeding u in [(k — 1)/n, k/n]. 
Let 2; denote the complementary sum. Now 


1 2 


(4.7) Gt..(u) — Gi,(—u) == 


(4.8) - 3eX2_ S Gin(u) — Gin(—u) S 22 Xien- 
1+ wv 
But 
=x. = / b’ dN(b)'| S 4a, | + / | b| an(o) | 
|b|>u jb|>u 


with large probability for n sufficiently large. Thus 


l + ue (2x%. - I. £4 b an(v)) —-é& Gin (u) - G,,( —t) 


2 


DXi — / b dN(b) + 4, 
b|>u 


A 


IIA 


with large probability. However, =X°., approaches f} (dX(t))’. Hence 


1 x) 
(4.9) lim p lim (Gre u) — Gt,(—u)) = | (dX(t))? — x dNi(2), 
0 


u+0 n~2 L. 06 
which completes the proof of (4.2). 


5. Estimating the trend term, y. As remarked earlier, the probability law of 
the differential process X is completely determined by a constant y and a bounded 
non-decreasing function G as given in (1.1). Two estimates have already been 
obtained for G; it now remains to find an estimate for y. In this section we shall 
derive an unbiased estimate 7 of y based upon complete observation of the 
process for ¢ ¢ [9, 1]. Indeed we shall prove that 


+00 3 


= RUS Th) | 


; - dN,(x) 
00 l +> xz” 


is an unbiased estimate of y. It is trivial to prove that 7 is an unbiased estimate 
of y when X is a pure Gaussian process. Hence we shall assume that X is not 
purely Gaussian in the development that follows. 





PARAMETERS OF A DIFFERENTIAL PROCESS 657 


We can effectively represent X(t) as a sum of three independent differential 
processes, 
(5.1) X(t) = U(t) + Vit) + Wit), 
where 


°* 9 
aut 
= 


In fow(u) = 


ea tue \1 +2 
(5.2) In frw(u) = tyut + ‘| e* —1—- :) —— dG(z), 
Ic\ l+zx = 


; ius tux PS FP 
In fw (u) yout + t e -l1-—— — dG(z), 
\z|2K 1 + 2 = 

where 

mn = ¥ F'n dG(x)/ fF dG(x), 

= ¥ S\zizx dG(x)/ fe dG(x), 
and K > 0 is an arbitrarily selected constant such that both K and —K are 
points of continuity of G. We first note that 


(5.3) 


oe 
—) 


K 
-In fra(u) = (x + | + aG(x)) = E(V(1)) 
du LK 


exists and is finite. The results of section 3 and the Lebesgue convergence theo- 
rem imply that 


K 
(54) 1 = V1) — Vio) - [ 

-x 1 
is an unbiased estimate of 7: . Further, we may write 


(5.5) In fway(u) = tu (>. = |  dG(x)) +/ (e"“* — 1) : + ¥ dG(z). 
zj2x v0 z\|2Kx xc 


Thus W(1) can be expressed as 


(5.6) W(1) = ye - | dG(a2) + / xdN,i(x2), 
zj2n v z|2K 


and consequently W(1) — Sis >x x dN,(z) isa constant. But again by the results 
of section 3, the random variable f,, >x (x/(1 + x’)) dN;(x) is an unbiased es- 
timate of f\z;>« (1/x) dG(x). Hence 
3 

(5.7) 4. = W(1) — W(0) — / ——~- dN;(z) 

z 2Kx l + a 
is an unbiased estimate of y.. Since y = y1 + yo, then 7 = X(1) — X(0) — 
< (2°/(1 + 2”)) dN,(2) is an unbiased estimate of y. 
Now if we let 





658 HERMAN RUBIN AND HOWARD G. TUCKER 


3 
al A —-] int ‘ al zx 4 y 
$(t) = {x X(0) f. ite ania), 


not only is 7(t) an unbiased estimate of y but is consistent in the sense that 





Ptlim y(t) = 7} = 1. 


REFERENCES 
{1} J. L. Doos, Stochastic Processes, John Wiley and Sons, 1950. 


{2} B. V. GNEDENKo anp A. N. Koutmocorov, Limit Distributions for Sums of Independent 
Random Variables, Addison-Wesley Publishing Company, Inc., 1954. 





A PROBLEM IN OPTIMUM FILTERING WITH FINITE DATA 
By GoprnatH KALLIANPUR! 
Michigan State University 


1. Introduction. In this paper we consider the problem of filtering in the 
presence of Gaussian noise for certain special types of message functions (see 
Sections 2 and 3 below). On the basis of observations of the corrupted message 
over a finite time interval it is required to provide an optimum predictor (or 
filter) of the message at some future time instant. Optimum is here understood 
in the sense of minimizing expected square deviation. 

The results of Sections 2 and 3 have been stated by Laning and Battin in their 
recent book [1, pp. 343 et seq.] in which they have advanced heuristic arguments 
for the plausibility of these results. A large part of this paper may therefore be 
regarded as a rigorous justification of the results stated in [1]. However, we do 
not attempt to justify their arguments but present a proof based on a different 
approach. 

The special types of messages have been chosen with the idea of generalizing 
naturally to the case of an arbitrary message z(t) which is a continuous, second 
order process independent of the noise. In Section 4 it is shown that the con- 
siderations of the earlier sections lead to at least a theoretical determination of 
the optimum filter. Section 5 rigorously derives an expression for the expected 
error. Comments on the scope of the method are offered in Section 6. 


2. Filtering when the message is of the form Vg(t). The message and noise 
functions x(t, w) and y(t, w) (0 < t S T) will be assumed to be suitable families 
of random variables defined on a probability space (Q, 5, P). For the present 
we shall consider the simple special case 


(2.1) x(t, w) = V(w)g(t) 


where V is a random variable with probability density h(v), with EV = 0 and 
finite second moment, and g(t) is a known continuous function which does 
not vanish almost everywhere in [0, 7]. We further assume that the noise process 
y(t, w) is jointly measurable in (t, w), and Gaussian with covariance function 
R(t, s) which is continuous in the square {0 S ¢t, s S 7}. From these assumptions 
it follows that y(t, w) is continuous in quadratic mean (q.m.) and that y(t, w) 
is square integrable over [0, 7] for almost all w. 

The assumption of joint measurability in (t, w) of y is not restrictive in view 
of Theorem 2.6 of [3, p. 61]. 

The corrupted message z(t, w) is given by 


Received January 28, 1958; revised June 5, 1959. 
1On leave from the Indian Statistical Institute, Calcutta. 


659 





660 GOPINATH KALLIANPUR 


te 


(2.2) 2(t,w) = V(w)g(t) + y(t, w). 
We suppose throughout that V and y(t, w) are independent. 

From the set of observations of z(t) over the finite time interval (0 S ¢ S T) 
it is required to construct the “‘best” predictor or filter of 2(7 + 7) (T; > 9), 
“best” being understood in the sense of minimizing the expected square devia- 
tion from the message at time T + 7, . It is known that such an optimum filter 
is given by the conditional expectation 


(2.3) E(x(T + Ty) | e);0 St s T). 


We shall now make the following basic assumption that the integral equation of 
the first kind 


(2.4) + R(t, s)p(s) ds = g(t) 


has a square integrable solution p(t). 

Let {An}, {on(t)} be the eigenvalues and eigenfunctions respectively of the 
(continuous) positive definite function R(t, s). Then it is easily seen that the 
random variables 

T 


Yyn(w) = vx. | y(t, w)dn(t) de (n 2 1) 
0 

are independent and normally distributed with mean zero and variance unity. 

Define the random variables 


T 
Za(w) = / z(t, w)dn(t) dt, (n = 1). 
0 


The above integrals exist for almost all w. By defining z,(w) arbitrarily over a 
set of probability zero we may assume z,(w) to be defined for all w. In the fol- 
lowing we shall omit the letter w to simplify the writing. It should be remem- 
bered, however, that certain Borel fields to be introduced below are Borel fields 
of w sets. We shall need the following lemma: 

Lemma. If condition (2.4) holds, then for every t in [0, T| 


2(t) = > Zndn(t) in quadratic mean. 
n=l 
Note that this representation of z(t) is not the familiar one given in terms of 
uncorrelated random variables, since the ¢,(t) are not the eigenfunctions cor- 
responding to the covariance function of z(t). Denoting the covariance of 2(¢) 
by K(t, s) we obtain 


N N 7 
E | at -> ena(d) | = K(t,t) -2>5 ¢,(t) / K(t, s)ox(s) ds 
n=l 0 


(2.5) : ae 
+> on(dontt) | | K(t, 8)n(t)om(s) dt ds. 
0 0 


n,meal 





OPTIMUM FILTERING 
Since 


K(t, 8) = g(g(s) + Rit, 8) 


we have 


[. K(t, 8)on(s) ds = dng (t) ot ee : 


> T 
/ / _ K(t, 8)oa(gn(8) dt ds = ym + = 


where a, = f¢ g(t)@n(t) dt and 5,» is the Kronecker symbol. Using the above 
relations we have 


N 


E Eo _ > euba(t | = K(t,t) + [> Angn(t) | 


n=l n= 


(2.6) 


N N ¢ (t) 
— 2g(t) - [> andn(d | Be Eee 
Observe first, that Mercer’s theorem is applicable to the continuous, positive 
semi-definite, symmetric kernel R(t, s) and hence that >°%.1¢5(t)/An con- 
verges uniformly to R(t, t). Secondly, g(t) is a “quellenmassig” function on ac- 
count of condition (2.4) and hence (see [2] pp. 114-115) its Fourier series 


io} 


» anon(t) = g(t), 


= 


the convergence being, in fact, absolute and uniform. From these two remarks 
and the fact that 
K(t, t) = ¢(t) + Rit, b) 
we conclude from (2.6) that 
N 2 
lim E | 2 -> eaba(t) | = Q. 
No n=l 


Let ${2(t), 0 < t S T) be the smallest Borel field of w sets with respect to which 
the 2(t)’s are measurable and let (2, , z2 , ---) be similarly defined. If $’ denotes 
the Borel field of sets which are either in $ or differ from $ sets by sets of proba- 
bility zero, it is clear from the above lemma that 

H[2(t),OSt Ss T) = S' (a, a,-:: 
Hence, from a martingale theorem of Doob ({3], p. 331), 


(2.7) lim E[Vg(T + 71) |a,--- , 201 = ElVg(T + 7) | 2,05 ts T| 


N+ 


with probability one. 





662 GOPINATH KALLIANPUR 


The problem of finding the optimum filter is thus reduced to first finding 
E|V-g(T + T;) | a, +++, en]. The joint probability density f(u, 2, --- , zn) of 
Var, %,°**, 2n(gi = g(T + 7:)) is then given by 


(2.8) flu, a. fs %% Zn) = Vv» oe An h (“) exp | - 1/2 es rj (2; wr Gis wa? | » 


91 1 j=l 


Hence, 


E|Vg(T + Ty) |a,--- en) = g(T + 71) 


/ rh(x) exp | — 1/2 >» As(z; — a)" | dx 
—20 i j=l A 


i h(x) exp | - 1/2 p> As(z; — a) | dx 
= g(T + T,) 


(2.9) 


| th(x) exp [Bax — 1/2A,2°] dx 
| h(x) exp [Bax — 1/2A,2°] dx 
The quantities B, and A, appearing above are given by 


n T 
B, = >. 3 a; 2; = / 2(t)pa(t) dt, 
j=l 0 


where 
pal) = E rsa 650), 
and 
A,= p> ja; . 


According to our assumption, the integral equation (2.4) has a square integrable 
solution p(t) (0 < ¢ S 7). Therefore, by a theorem of Picard ({2] pp. 135-136) 
the series 2 Njaj < © and p(t) = > dade), the series on the right 
converging in mean. Hence 


T 
(2.10) lim B, = / z(t)p(t) dt with probability one. 
n+ 0 
Further, the series }>7_; \ja; converges, and we have 


nwo 


2 T 
(2.11) lim A, = >, d; a5 = | g(t)p(t) dt > 0. 
j=l 0 





OPTIMUM FILTERING 663 


From (2.9), (2.10) and (2.11) we find that the optimum filter for 2(7 + T7)) is 
H(f% 2(t)p(t) dt), a function which depends on the observations z(t) only through 
the statistic {7% 2(t)p(t) dt. In this case, the function H is specifically given by 


~ 2 T 
/ xh(x) exp E - = / g(t) p(t) at | dx 
H(t) = oT +7). > 


h(x) exp E _ 5 I. g(t) p(t) at | dx : 





—.2 


3. Extension to more general types of messages. Let the message z(t) be 
given by 


(3.1) a(t) = 2d Vigil), 


the gi(t) being continuous functions. We assume that V;,--- , V,, have a joint 
probability density h(v, , --- , Um). The noise y(t) is supposed, as in the previous 
section to be independent of x(t). Condition (2.4) is now replaced by the assump- 
tion that each of the m integral equations of the first kind 


(3.2) f Rit, s)p(s) ds = g(t), (i= 1,--++,m), 


has a square integrable solution p,(t). Set 


. T 
a = [gen dt and ga = g(T +7). 


The following relations are easily verified: 


m= DaVit e. 


i=l] 


The joint distribution function F(u, 2 , --- , Zn) of 


m 
> Vigu,%, *?* ae 
i=] 


is given by 


[ bare, t=) (VIn)*{ [ loonteadly Dis puma 


-exp[— 1/2(ti + --+ + th)] dt +++ dtp pda +++ dam, 


where D is the region { }-7-1 go 2; S u}. Then 





664 GOPINATH KALLIANPUR 





++ 52m) day -+- dtm 


o” PF a/iyy << &, ca = Ne 
O2,+** OZn (A 2)” | har, - 


(3.3) exp| ~1/2 5, mw (: - x ais x) | 


_ Ve: 
(\/2n)" Ke fers yam) dz, +++ dg 


- exp {— 1/2/C, — 2Bix + 2’A,2]} 
Here, 


o ~ ye} ’ 
j=l 
o = (a g PFs Sa)’, B, is the vector 
T 
Be = (Bo, ---, Boy, BY = / 2(t)p?(t) dt 
0 
where 


= » rjas'o;(t) , 


and A, is the matrix (A}”"’) 


goo (r) a‘” 
my AiG; a; 


Condition (3.2) implies as in Section 2 that 


ow 


lim Aw” = Dds asa _ < = 
since 
d dja;"a5” s(= dj ai") (= jas" 7 < ow, 
d= 
Hence 
7 
(3.4) lim AY’? = / p-(t)g.(t) dt = A”, say. 
n-2 0 
Further, as before, 
. 
(3.5) lim BS? = / 2(t)p,(t) dt, with probability one (i = 1, --- ,m). 
n+. 0 


Hence making n — © in the expression for 


E\ > Vig(T + T)) | By °° * 5 8e 


L_it=l 





OPTIMUM FILTERING 

we obtain from (3.3), (3.4) and (3.5) the optimum filter to be 
2. gi(T + T) 
i=] 


(3.6) 


[- . | Tih(ay, +++ , tm) exp [B’x — 1/22'Aa] dx; +--+ dim 


7. 


[- . [nce ,*** 52m) exp [B’x — 1/22'Ax] dx, --- drm 


where B is the vector (B™, --- , B“”) and A is the matrix (A“”). We observe 


that (3.6) turns out to be a function which depends on the observations z(t) 
only through the m statistics 


| _e(d)pa(t) a 


In the expression for the optimum filter given by (3.6) it is important that the 
matrix A be positive definite. This is ensured by the following criterion: for the 
matrix A to be positive definite, it is necessary and sufficient that the functions 


(gi(t)) (¢ = 1, --+ , m) be linearly independent. From (3.2) and the definition of 
A“ we have 


A® q | ' | _ RU, s)pid)p4(s) at ds. 


Now, R(t, s) being a covariance function is a positive semi definite function, 
that is, the double integral 


T T 
(37) asf =[ [ RG, sf(oj(e) das = 0 
0 0 
for all functions f which are in L,(0, T). Setting f(t) = wpi(t) + --- + Umpm(t), 
we have Q = u’Au where uw is the vector with the real numbers wu; , --- , Um as 


components. Let A be positive definite. If the functions g;(t) are linearly de- 
pendent, then from the relation 


= uj, gi(t) = [ R(t, s) be Uy p(s) | ds 


t=] 


it follows that u’Au = 0 for some u ¥ 0, a contradiction. Conversely, u’Au = 0 
for u # 0 implies f(t) = 0 a.e. so that from the above relation it follows that the 
g; are linearly dependent. Hence A is positive definite if and only if the functions 
{gi(t)} are linearly independent. Our assertion is thus proved. 


We can now state fully the assumptions under which the filtering problem 
of this section has been solved: 


(3.8) The continuous functions g;(t), --- , gm(t) appearing in the message are 


linearly independent. 


(3.9) Each of the integral equations (3.2) has a square integrable solution p,(t). 





666 GOPINATH KALLIANPUR 


4. Message x(t), an arbitrary second order continuous process. We assume 
independence of x(t) and y(t) and that Ex(t) = 0. Let M(t, s) be the covariance 
of x(t) and let (¥,(¢)), (u,) denote the eigenfunctions and eigenvalues, respec- 
tively, of M(t, s) (0 S t, s S T + T7,). Then we have the usual expansion for 
x(t) which is convergent in quadratic mean: 


: . a (T + Ti) 
(4.1) «(T + T;) = lim >> WOE™ TF) ms 
CO men Sty, 


where the z,’s are uncorrelated variables with mean zero and variance one. 
Hence it follows that 


Ele(T + 7,) |2();0 5 ts T| 


= lim E b rivi(T + T) | 2:0 <t< r| 
moo t=1 V ui 

However the expression on the right side of (4.2) whose limit in mean is taken, 

has been evaluated in the preceding section and is given by (3.6), where we 

identify 


(4.2) 





VAT + 7) 
V ui 
Writing h,,(z1,--- , 2m) for the joint density of (7, ,---, tm) we then have 

proved the result 


g(T + T;) = und =V; = 2;. 


Ele(T + T) |2;0 < ts T) = tims AEE 
Mo i=l V u: 


(4.3) / des | ti hm(t1, -** tm) exp [B’x — }2x’Aa] dx, --- dtm 


/ tee | Iim(t1,°*+* , Xm) exp [B’x — $2'Ax] dx, --+ drm 


Here the vector B and the matrix A are defined as in section 3. (4.3) tells us that 
in the general case, the optimum filter can be approximated by functions of the 
form (3.6). It also shows that the optimum filter, in general, is a function which 
depends on z(t) only through the statistics 


- 
| 2(t)p,(t) dt (j= 1,2,---), 


The actual computation of the approximation is bound to be tedious since the 
number of elements in the vector B and in the matrix A becomes large for large 
values of m. 


5. Expression for expected error under optimum filtering. We shall assume 
now that the signal or message is the finite sum given in (3.1), with the proviso 





OPTIMUM FILTERING 667 


that the functions g:, --- , gm are linearly independent. The optimum filter is 
then given by the function H(B™,--- , B“”) defined by (3.6). Writing 


e= H(B”, mt foe a) ae be Vigu, 
t=1 


the expected error is E(e’). Before we compute this quantity we observe that the 
random variable 


° 3 - : : , 
B® => AV; + Bi, where Bi = | pdyld) dt 
0 


j=l 


It is easy to see that the random vector By with compents Bj” (i = 1, --+ , m) is 
normally distributed with zero mean vector and covariance matrix A. Let R,, 
denote the m-dimensional euclidean space, a;, gi (i = 1,---,m) and v the 
column vectors 


, i mi , 
— [A! 7 A |, go = [910 oo re g mo] 
and 
vf = [y+ On 
Then using vector notation, 
E(é) = (2x)~? | A|7 a exp [— 42’A‘z] dx 
Rm 


( 


. ‘| [H(ay + v’ar, +++ 52m + v’am) — v'gol h(v) av 


Introducing an appropriate change of variables as in [1] (pp. 353-355) we obtain 
the expression for E(e’) given by relations (8.6-28) and (8.6-29) in [1]: 


(5.1) E(é) = (2x)? |A rn f F(x) exp |—}a’A“‘z] dz, 
Rm 
where 
F(x) = i [H(x) — vg” exp [—43(v’'Av — 2v’x)|h(v) dv. 
Rm 


As an illustration of the method of this paper we shall obtain the optimum 
filter together with the expected error for the following simple example. Let the 
signal x(t) = Vigi(t) + Voge(t), (gi: and ge being linearly independent), where 
V, and V- have the joint distribution 


P(V; = 1, Vz = 0) = P(V; = 0, V2 = 1) = 3. 


This means that the signal can, with equal probability, be one of the two func- 
tions gi(t) and g,(t). The expressions for the optimum filter and the expected 
error obtained above still hold true (with obvious changes) when the vector V 
does not have a probability density. We may now write the signal 





668 GOPINATH KALLIANPUR 


x(t) = (git) + go(t)) + Vigult) + Vigelt) 


where P(V; = 3, V2 = — 3) = P(V; = — }, Vz = 4) = }. Denoting by H and 
Ho the optimum filters A the corresponding signals are x(t) and V 191(t) “- 
V2g2(t) respectively, we have H = 4(g,(t) + go(t)) + Ho, where from (3.6) 


Hy = 3(910 — go) exp [3(B, — B:)| — exp [—4( B; — B,)| 


exp [3(B, — B.)] + exp [—3(B, — B)]|" 


Hence 


i gio €xp [4(B, aa B,)| + go exp [—4(B, _ B:)| 
exp [3(B: — B:)| + exp [—43(Bi — B:)) 


To compute the expected error E(é) = E[Hy — (Vig + V2g2))" from the ex- 
pression (5.1) we first evaluate F(a, , x2). 


F(x, , 22) = 3(gw — go) exp [— $ (A — 2A™ + A™)] [Cosh }(2, — 22)]", 
and 


29) 


T 
Am —~ 47 4. APM = - | [gi(t) — go(t)| [pi(t) — po(t)| dt. 


Writing D for this quantity, the expected error 


i a a en) | 1/2 exp (—$z ‘A ; 
k (¢) = (gro Gao)” exp ( $D)- (2 1) A | [. [. Gosh hae, on = ae de, . 


The inequality Cosh u 


IV 


1 yields the following upper bound to E(é): 


? 
E(é) = (gio — ga)” exp | 1 [ (g: — g2)(pi — pe) at}. 


6. Miscellaneous remarks. A few remarks are in order comparing the method 
of Laning and Battin with the approach of this paper. These authors begin by 
considering the discrete problem in which observations are made over a discrete 
set (4; , +--+ , ¢,) of time instants which is dense in the interval (0, 7). The opti- 
mum filter is then obtained for this problem and it is heuristically argued that the 
results obtained for this case converge to well defined continuous operators. 
However, while the solution of the discrete problem is straightforward and 
intuitive, a justification of a direct passage to the limit does not seem to be easy. 
The indirect approach using the theory of integral equations is a natural and, 
perhaps, a mathematically more elegant alternative. It must be pointed out that 
in this paper we have been concerned with only one type of loss function, viz., 
squared deviation, whereas the treatment in [1] assumes a loss function ¢(¢) with 
merely the obvious restrictions that ¢ be nonnegative and vanish for e = 0. It is 
not clear what precise restrictions on @ are needed for a rigorous derivation of the 
results. The physical motivation behind the problem of Laning and Battin is 
clear and is of great interest to the statistician, but, to our mind, the apparent 
generality of the argument presented in [1] obscures the limitations of their 





OPTIMUM FILTERING 669 


method. Turning to condition (2.4) (or alternatively, to (3.8)—(3.9)) it is easy 
to see what some of these limitations must be. The existence of a square integrable 
solution of the integral equation (2.4), which is basic for our purposes, implies 
(a) that g(t) can be expanded in a series of the eigenfunctions of R(t, s), and (b) 
that the series }>%_, Aina), is convergent. The latter condition, in particular, is a 
restriction on the pair of functions R(t, s) and g(t). It terms of the original 
filtering problem this essentially restricts the admissible pairs of noise and 
message functions. For instance, consider the following simple example’ with 
R(t, s) = min (¢, s) and g(t) = ¢t. In this case 4, = (n + 4)'x'T™ and a, = 
(2/T)'*(— 1)"\;", so that the series 7 Na, fails to converge. Now consider the 
discrete problem as in [1]. Letting = kA (k = 0,1,---,n; nd = T) be the 
time instants of observation, it can be shown by actual computation of the in- 
verse of the matrix (R(t; , t;)) that, for this choice of g(t) and the covariance, the 
discrete optimum filter does not converge to a well defined continuous solution. 

In conclusion, we observe that the results of Section 3 can be viewed in the 
context of the familiar statistical problem of estimating a regression 
>-"., Vig:(t) where there is an a priori distribution H(v,,--- , vm) on the re- 
gression parameters V; . 


REFERENCES 


{l] J. H. Lanina, Jr. anp R. H. Battin. Random Processes in Automatic Control, McGraw- 
Hill, 1956. 


{2} R. Courant, ano D. Hitpert. Methoden der Mathematischen Physik, Vol. 1, Berlin, 
1931. 


[3] J. L. Doos, Stochastic Processes, Wiley, 1953. 


2 T am indebted to the first referee for this example and to the second referee for pointing 
out the connection with the regression problem indicated in the following paragraph. 





ON A CHARACTERIZATION OF COVARIANCES 
By A. V. BALAKRISHNAN 
University of California at Los Angeles 


1. Introduction. Let F(s, t), —« < s,t < ©, be a covariance function, that 
is to say F(s,t) = F(t, s) and F(s, ¢) is non-negative definite. Let m(s) be any 
complex valued function on —*# < s < o. It is trivial that then 

F(s, t) + m(s)m(t) 
is also a covariance. However, this is no longer true if we consider instead 
(1) F(s, t) — m(s)m(t). 
In this paper we obtain a set of necessary and sufficient conditions on m(s) in 
order that (1) be a covariance under the restriction that F(s, ¢) is a stationary 
covariance; i.e., F(s, 4) = F(s — t). We also indicate an application of the re- 
sult to the problem of estimating the mean value of a stochastic process. 

2. Main results. 

THEOREM 1. Let R(t) be a continuous stationary covariance function with R(0) 
Jinite. Let m(s) be any function on — 2” <8 < &. Then a necessary and sufficient 
condition that 
(2) R(s, t) = R(s — t) — m(s)m(t) 


be a covariance function is that m(t) have the representation 


x 


(3) m(t) = [ exp (itx) dy, 
where p(-) is a function of bounded variation, and that, further, 
(4) [ | du/dG |? dG 1, 


G(-) being the spectral distribution corresponding to R(t), so that 
(5) R(t) = [ exp (itr) dG. 


Proor. Necessity: Let R(s, t) given by (2) be a covariance. Then we can 
(see [1], p. 72) construct a Gaussian process y(t), —»° < t < , with zero 
mean so that E[y(s)y(t)] = R(s, t). Now, since R(t, t) must be non-negative 
if (2) is to yield a covariance, m(t) is necessarily bounded. Letting 


a(t) = y(t) + mit), 


we have E[x(s)2z(t)] = R(s — t), so that the x(t) process has finite first and 
second moments and is stationary in the wide sense. Moreover, R(t) is con- 
tinuous. Using the spectral representation theorem ({1], p. 527), we have 


Received November 10, 1958; revised February 24, 1959. 
670 





CHARACTERIZATION OF COVARIANCES 


a(t) = [ exp (it\) dZ(\), 


where Z(A) has orthogonal increments with E[| dZ(d) |*] = dG(d). Now let 
u(A) = E[Z(A) — Z(do)] for some fixed \). Then, for any finite sequence of 


non-overlapping intervals {a;, bj, we have 


| ~ (u(b;) — u(ai)) ? Ss E || z= (Z(b:) — Z(bi)) || = a | ; dG, 


so that 4(A) is of bounded variation, and also absolutely continuous with respect 
to the measure dG on the Borel field of the real line. Moreover 


E[x(t)] = m(t) = [ exp (itd) du(A). 
To prove (4), let f(-) be any function in L.(dG@), the Lz. space with respect to 
+. Then 


the measure dG. 
| [_s0) azo) a / f(x) du(a) 


defines a linear functional on L.(d@). Denoting this functional by L(f), we have 


IL(f) ? s [-iseor dG(r) = || fl’, 


|| f || being the Z.(dG) norm. Hence the norm, || L || , of the functional satisfies 
|| ZL || < 1. Moreover, there is a g(-) in Le(d@) so that for every f(-) in L.(dG), 


fA) du = [ fOr)gQr) dG. 


= dyu/dG and since 


This implies that g = 


[ igs) dG = ILI < 


necessity follows. 
Sufficiency. If R(t, s) is defined by (2), and if the conditions (3) and (4) are 
satisfied, we have, for any finite sequence of numbers {a,} and any {t,j, 


> > a; R(t; , t;)4;, 


2” | 0 2 
= / | >> a; exp (it;A) |? dG — | [ >. a; du/dG exp (itp) dG) , 


where the second term, by the Schwartz inequality, is 
ao 
' ’ ¥12 
| du/dG |" dG 


s/ | >> a; exp (it;r) | dG [ 
a) j 2 


</ | 2 a; exp (it;r) | dG, 
my 


using (4). Hence > YKaR(t;, t;)4; 2 O as required. 





672 A. V. BALAKRISHNAN 


If R(s, t) given by (2) is also required to be stationary, a stronger result is 
the following corollary: 

Corouuary 1. Jf R(t) ts a stationary continuous covariance, a necessary and 
sufficient condition that 


R(s, t) = R(s — t) — m(s)m(6) 
be also a stationary covariance is that 
(6) m(t) = aexp (1rot), Xo real, 


where the spectral distribution, G(-), corresponding to R(t) has a jump (atomic 
part) at X» not less than | a’. 
Proor. We first note that the required stationarity implies that 


m(s)m(t) = f(s — t), 


which in turn necessarily implies that m(s) be of the form m(s) = a exp (19s) 
since, by Theorem 1, m(s) is continuous. The rest of the corollary is immediate 
from Theorem 1. 

We have so far assumed R(t) to be continuous, so that (5) holds. In the 
absence of (5), m(t) may not have the representation (3). To see this, we have 
only to take a non-Lebesgue measurable character of the real line x(t), and set 
R(t) = 2x(t), m(t) = x(t), in (2). Then m(t) cannot have the form (3) or 
(6), since this would imply continuity, which is false. 

Theorem 1 has, as may be expected, an immediate paraphrase for stochastic 
processes. In the terminology of Doob ({1], p. 95) a stochastic process 


a(t), -2 <t<o, 
is stationary in the wide sense if E{| x(t) |*| is finite and 
(8) Elx(t)z(s)] = R(t — s), 


without any additional assumption on the mean value E|x(t)]. As Doob has 
pointed out, the usual assumption that E[x(t)] be a constant, is unnatural. On 
the other hand, (8) does impose a restriction on the character of E[x(t)], and 
this may be read from Theorem 1. Thus the following result may be stated. 

THEOREM 2. Let x(t) be a stochastic process stationary tn the wide sense, and 
continuous in the mean of order two. Let R(t) be its covariance function (given by 
(8)) with spectral distribution G(X). Then a necessary and sufficient condition 
that a function m(t), —»7 <t < ©, be the mean value of such a process is that 
it satisfy (3) and (4). 


3. Application and extensions. As an application of this result we shall con- 
sider a problem that arises in the estimation of the mean value of a stochastic 
process. It has been treated by Grenander [2], [3] using the special concept of 
the Hellinger integral. We shall, for simplicity, use the discrete parameter ver- 
sion, since this has no essential bearing on the problem. Moreover, since the 





CHARACTERIZATION OF COVARIANCES 673 


discrete parameter versions of Theorems 1 and 2 are obvious, we shall not state 
them separately. Thus let 


(9) Yn = In + Mun 


be a time series, — «© <n < ©, where 2, is a stationary time series with (finite) 
covariance R(n) and zero mean, and where E[y,] = mu, . Further, let un, have 
the form 

1/2 


un =| exp 2rind du. 
1/2 


It is desired to estimate the constant m from the {y,} series and the question is: 
under what conditions do we have consistent (in the mean square sense) linear 
unbiassed estimates for m? By this we mean that we wish to know whether it 
is possible to construct a sequence ¢, of random variables of the form 


n 
tn = Dciy, 
—n 


where the coefficients ci are to be so chosen that E[f,] = m, and where further, 
the sequence ¢, converges in the mean of order two to m. An answer to this 
question is given by Theorem 3. 

THEOREM 3. A necessary and sufficient condition that consistent (in the mean 
square sense) linear unbiassed estimates for m in (9) exist ts that {mu,} not be a 
member of the class of sequences which can serve as the mean value of a wide sense 
stationary time series with covariance R(n), for any non-zero value of m. 

Proor. Necessity. Suppose, contrariwise, that, for some mo not equal to zero, 
the sequence {mou,} can be the mean value of a series with covariance R(n) 
Then paraphrasing Theorem 2 to the discrete parameter case, we must have 

1/2 
Mon = Mo [,, exp 27in\ du, 


where 
1/2 


| | du/dG |? dG < 1/ms < ~, 
L— 1/2 


with 


1/2 
R(n) = / exp (2mind) dG. 


1/2 


, . 4 . ‘ _ 
Next, let m=? = >°",ciy, be any linear unbiassed estimate for m. Then we must 
have m = >__,ciE[y:] = m>_",chu , so that, if 
P,0X) = dock exp (2ikmd), 
1/2 
then fij2 Pa(A)du = 1. However, 
1/2 


| 1/2 2 1/2 
| | P,(d) dul | | Pa(d) |? dG / | du/dG |? dG. 
| 1/2 1/3 


1/2 





674 A. V. BALAKRISHNAN 


Hence 
1/2 . 1/2 i 
[ |P.(A) |? dG 2 i/|f | du/dG |" ag], 
1/2 1/2 


* 2 
var m, = mo > O. 


Or, 


Thus no consistent linear unbiassed estimate is possible. 
Sufficiency. Suppose my, cannot be the mean value of a time series with co- 
variance R(n). This can happen only in one of two following ways: 
(i) dy is absolutely continuous with respect to dG, but 
1/2 


[, \du/ac | aG = +, 
1/2 


(ii) du is not absolutely continuous with respect to dG. First, suppose (ii) is 
true. Then we can find a Borel set B on which 


[ dG =0 and | dp # 0. 
/B B 


Under these conditions, it is possible to construct a sequence of polynomials 
P,(A) in exp (277A) such that 


1/2 


[ | P(A) |? dG + | dG = 0, 
1/2 B 
and 
1/2 

| P,(d) du =1. 

-1/2 
But with each such polynomial, P,(A) = >ccf exp (2rikd), we can associate 

. . ° * . 

the linear unbiassed estimate m. = > cry, whose variance tends to zero, 


proving the existence of consistent linear unbiassed estimates, as required. 
Next, suppose alternative (i) holds. Then g(\) = du/dG is Borel measurable, 
and 


1/2 


(10) | Ig(d) | dG < « 
. 1/2 
1/2 ; 

(11) | Ig(d) 2? dG = +o, 
-1/2 


In view of (10), for any polynomial P(\) in exp (277A), we have 


| IP(A)| g(a) | d@ < @, 
1/2 





CHARACTERIZATION OF COVARIANCES 


so that 
1/2 1/2 

(12) P(d) du = [ POd)9() a 

L_1/2 12 
defines a linear functional on the polynomials P(A) which form a dense sub- 
space of L.(dG). Now to show the existence of consistent linear unbiassed esti- 
mates for m, it is enough to show that we can find a sequence of polynomials 
P,(X) in exp (277d) such that 


1/2 


| P,Q) duc #0 
Lae 


[ “| Pa(d) dG = 0. 
1/2 


However, if this is not true, (12) would define a bounded linear functional on 
L.(dG) (because of continuity on a dense sub-space) thus contradicting (11). 

It would appear that the basic result (Theorem 1) is capable of extension to 
the harmonizable covariances of Loéve [4]. In this connection, it may be noted 
that R(s, t) in Theorem 1 is easily verified to be harmonizable. 


REFERENCES 
fl] J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 
(2) U. Grenanper, “On Toeplitz Forms and Stationary Processes,’’ Arkiv for Matematik, 
Vol. 1 (1952), pp. 555-571. 
[3] U. GRENANDER AND G. Szgeaié, Toeplitz Forms and Their Applications, University of 
California Press, Berkeley, 1957. 
[4] M. Lotvr, Probability Theory, Van Nostrand Co., New York, 1955. 





ON ASYMPTOTIC DISTRIBUTIONS OF ESTIMATES OF PARAMETERS 
OF STOCHASTIC DIFFERENCE EQUATIONS 


T. W. ANDERSON 
Columbia University and Center for Advanced Study in the Behavioral Sciences 
1. Summary and introduction. Let x, (¢ = 1, 2, ---) be defined recursively by 


(1.1) Xp = aXy1 t+ Uh, #=1,2 


eee, 
° 2 2 . 

where 2 is a constant, &u, = 0, Su; = o and Suu, = 0, t ¥ s. (& denotes mathe- 

matical expectation.) An estimate of a based on 2,,--:-, xr (which is the 

maximum likelihood estimate of a if the u’s are normally distributed) is 


4 T 7 
( 1.2) a = (> Xe 1) / (> x ) . 
t==1 t= 


If |a| < 1, /7(& — a) has a limiting normal distribution with mean 0 under 
fairly general conditions such as independence of the u’s and uniformly bounded 
moments of the u’s of order 4 + e¢, for some e > 0. (See [2], Chapter II, for 
example.) If |a| > 1, White [3] has shown (& — a)\a|” / (a’ — 1) has a limiting 
Cauchy distribution under the assumption that 2» = 0 and the w’s are normally 
distributed; he has also found the distribution when xz» # 0. His results can be 
easily modified and restated in the following form Oss ti1)*(&@ — a) hasa 
limiting normal distribution if the w’s are normally distributed and if |a| ¥ 1. 
Peculiarly, for ja] = 1 this statistic has a limiting distribution which is not 
normal (and is not even symmetric for x» = 0). One purpose of this paper is to 
characterize the limiting distributions for |ja| > 1 when the wu’s are not neces- 
sarily normally distributed; it will be shown that for |a| > 1 the results depend 
on the distribution of the u’s. Central limit theorems are not applicable. 

Secondly, the limiting distribution for |a| < 1 will be shown to hold under 
the assumption that the u’s are independently, identically distributed with 
finite variance. This was conjectured by White. 


2. Asymptotic distributions in the unstable case. Here | a | > 1. Let 


7 T 
(2.1) Ar = oma — a> tin 
1 


1 


UrXi-1, 


(2.2) By 


? 
1 
| 
: 
= : Lt—] « 
i 
Then & — a = Ar/B,. Note that 
Le = ON + Usp = alate t+ Wu) +m = °: 


—1 
= UW ot aut + mle + a’ Uy + a‘ xo ° 


Received August 25, 1958; revised February 17, 1959. 
676 





STOCHASTIC DIFFERENCE EQUATIONS 
Let 8 = 1/a and let 


‘ t-—2 - t-—2 

(2.4) 2=B B@ra=mt Pet::' +B Ust any. 

It is easily verified that &z, = 
THEOREM 2.1. 


(2.5) plim G 


azy and Var z, > o’/(1 — 6°) asT > @. 


T+-2 


Proor. We shall show that 


ar "Re ci 


converges stochastically to 0. From (1.1) and (2.4) we find 


— t—2 
(2.7) 2 =Z2i1tB we, 
and hence 


(2.8) zr = B" “urs + B" i a 
We shall use the results that 


2 


&(z7 _- zr) 1a" 2) + “he 


(2.9) er 
-e.. 


1— #’ 


9 9 
= 262r-s + 2&zr 


> 


&(zr-. + zr)” 
(2.10) 


2 
< 4823 <4 (, —% +a ai). 
Then 


28 _ . 
& | (zr + Zr) (27-2 — Zr) | 


B'[8(zr i zr) &(2¢ .= zr)’} 


oc 


} T-1 
2.2 ey \T+s—1 
at ez) Gi mi & | B | 


a=] 


- —— —- — _ 
“9 (1 — #)'1— |B 





678 T. W. ANDERSON 


By Tchebycheff’s inequality 


| see 1— sp", K 

‘ ‘ | g2ti(T—2) 2 | \T 

(2.12) Pr{/8 Br - l—-- =e =—\sl, 

where K is a constant, and for T' sufficiently large this is arbitrarily small. Since 
~2T 

(2.13) & = ger SBC 


for C a suitable positive constant, the term in (2.6) converges in probability to 0 
and the theorem follows. 

The convergence in (2.5) is also with probability 1. The sum of (2.12) for 
T = 1, 2, --- converges and similarly for (2.13). Hence, by the Borel-Cantelli 
Lemma (2.6) converges to 0 with probability 1. 

It should be observed that zz will have a limiting distribution. (In fact 
>f a“ u, converges in the mean and with probability 1.) It will also be noted 
that 6°°”” B, is in the limit a nondegenerate random variable; if 8°" ~” is re- 
placed by a function of T' that decreases faster, then the resulting random variable 
converges stochastically to 0. 





Let 
(2.14) Yr = Ur + Bury +++ +B" we +B" Mw. 
THEOREM 2.2, 
(2.15) plim (B" *Ar — yrer) = 0. 
T?2 
Proor. We have 
T-—1 
T—2 2 
B Ar — yrer = >, Bur s(Z2r-s — Zr). 
s=l 
Then 
T—1 
T 24 oil! < lalte | © Be ag | 
& |B Ag — Yrér;| = |B) & \Ur_s( Zr» Zr)| 
s=l1 
9 m4 
< > |a\‘leut_.8(2r. — 2r)") 
(2.16) s=1 
2 T—1 
o . 
< = \a\" ; 
a V1 — Ra | 


—— (T — 1) \g(™. 
V1 — # ) 
Since (2.16) converges to 0, the Tchebycheff inequality implies the theorem. 
Since the sum of (2.16) for 7 = 1, 2, --- converges, the Borel-Cantelli lemma 
implies convergence with probability 1. 
It will be noticed that yr has the same form as zr except for yo and the order 





STOCHASTIC DIFFERENCE EQUATIONS 679 


of the u’s is reversed and there is one more term. Under the assumptions we 
have made yr does not necessarily have a limiting distribution. For example, 
ur makes a not negligible contribution to yz ; if the u’s are independent and if 
the sequence of distributions of ur is wildly fluctuating, yr will not have a 
limiting distribution. However, if the u’s are independent and identically dis- 
tributed, yr has the same limiting distribution as z7 for 7 = 0. The covariance 
between yr and zr is (7 — 1)0’8""', which converges to 0. 

THEOREM 2.3. If the u’s are independently distributed, and if yr has a limiting 
distribution, then (yr, zr) has a limiting distribution, say the distribution of 
(y, 2), and y and z are independent. 

Proor. Let 


(2.17) 


(2.18) Do ay 


t=—(4$7)+1 
> 
(2.19) yr ae 
t=(§T]+1 


(iT) 
(2.20) > Bu, 
t=1 


where [}7'] is the largest integer not greater than $7’. Then z7 and y7 are inde- 
pendently distributed because they involve disjoint sets of u’s. We have 


*\2 22 
(zr —2r) = &2p 


E(yr — yr)” 


2 92(T—[4T}) 
op’ (47) 


—— $ 
a er 


Then zp — zp and yr — yr converge stochastically and with probability 1 to 0 
and the theorem follows. 

THEOREM 2.4. If (yr, Zr) has a limiting distribution, say the distribution of 
(y, 2) then (B” *Ay, (1 — 8°)B"" ” Br) has a limiting distribution, the distribution 
of (yz, 2). 


' This theorem as well as several other points, was suggested by Julius Blum. 





680 T. W. ANDERSON 


THEOREM 2.5. If (yr, 27) has a limiting distribution, the distribution of (y, z), 
and if Pr {z = 0} = 0, then [a’/(a’ — 1)](&@ — a) has as a limiting distribution 
the distribution of y/z. 

THEOREM 2.6. If the u’s are independently normally distributed, the limiting 
distribution of (yr, zr) is normal with variances o°/(1 — 8°), correlation 0, 
Ey = 0 and & = an. 

It will be observed that if the u’s are independent and not all normally dis- 
tributed, then zr does not have a limiting normal distribution. For example, 
u is not negligible; if it is not normal, z is not normal (since a convolution is 
normal only if the two component distributions are normal). 

In the case of yr, if all the u’s beyond some ¢ are normal and independent, 
then yr will have a limiting normal distribution. If the u’s are independently 
and identically distributed, then yr has a limiting normal distribution if and only 
if the w’s are normally distributed. If a is an integer and if the u’s are inde- 
pendently distributed according to a rectangular discrete distribution over 
0, 1, --- ,a@— 1, then the limiting distribution of yr is (continuous) uniform on 
(0, a). It will be noted that central limit theorems are not applicable here. 

THEOREM 2.7. If the u’s are independently normally distributed and if x = 0, 
[a"/(a’ — 1)](& — a) has a Cauchy distribution as a limiting distribution. 

THEOREM 2.8. If the u’s are independently normally distributed 
(Seat) *(4 — a) has a limiting normal distribution with mean 0 and variance o’. 

THEOREM 2.9. If the u’s are independently normally distributed, the limiting 
moment generating function of B’*(1 — B’)Ar/ oo’ and B" (1 — 8°)’By/ o is 


: 1 2 al 2 y? 2V 
9 9° — 7? —92y)? 2(a I)xo U +2 | 
(2.23) (1 l 2) ) exp | o Se U2 ee 2V 7 
Proor. We have 


,(Uyet V22) (1—8?) /e2 





&e 
° oo 2 
ae B —4(1—8?) [y2 +(2—arg)?) /o2+(Uy2+V 22) (1—8?) /o? ¥ 
9° = “soe dy dz 
(2.24) Lo Jo 20" 
o ~ 2 
1 _- B —h(1 2) fy 2—2 Vy2+(1—2V) 22 —2z +a2xr2)/¢2 
= ! [ - : e 2 B* ty yz 2*—2targta *,! q dy dz 
L.00 C) 24a 


which is (2.23). This was given by White [3]. 

Theorem 2.8 permits setting up tests of hypotheses about a and forming 
confidence intervals for a if the u’s are independently normally distributed. 
In the case of |a| < 1, the result holds without the assumption of normality 
(see Section 4). It should be emphasized that statistical procedures based on the 
asymptotic normal distribution of . ri) (a — a) have wide scope when 
la| < 1 but when |a| > 1 are justified only if the u’s are normal. 

Ifa=1,2% = a u, + x and the numerator of @ — a is 








STOCHASTIC DIFFERENCE EQUATIONS 


Ar > Url t-1 


> UUs + zo Uwo 
s<t 
= (> u,)* - > ui] + to2, Ut. 


° ° ° . 2 rr 
In this case the normalization factor is T°. Then 


(2.26) ; Ar = 4 (30 u/VJVTY -— 3 > wi/T +m Dd u,/T. 


If the w’s are independently and identically distributed, }\u,/ 7 conv erges 
stochastically to &u, = 0 and >> uj / T converges stochastically to &u? = o’, 
and >, u: / ~/T has a limiting normal distribution with mean 0 and variance o” 
Thus the limiting distribution of A7/T is that of $2” — 40°, where z has a normal 
distribution with mean 0 and variance o”. From this it is clear that A, multiplied 
by any nonnegative function of the observations cannot have a limiting normal 
distribution since 

(2.27) lim Pr {Arp S 0} = Pr {x’ S o°} 

t+ 


which is not 3. White has observed that if the u’s are independently normally 


distributed and if xz» = 0, the limiting distribution of 


A,/T 
B,/T? 


(2.28) T(aé—a)= 


is that of 


(2 29) — #1) — 1 


9 1 E 
a x(t) dt 
0 


where x(t) is the Wiener stochastic process with &xr(t) = 0 and &«°(t) = ¢, and 
he has given the limiting characteristic function of (Ar/T, Br/T’). 

It might be noted that in the case of |a| > 1, the condition &u; = o° could 
be replaced by the condition &ui = of < M for some M. The results would 
involve such modifications as replacing o°/(1 — 6°) by 


DP Bor: [<M/(1 — 6°]. 


3. Asymptotic distributions in the unstable vector case. Let x, and u; be 
p-component column vectors and aap X p matrix. Let the process be defined 
by (1.1), where zp is a vector of constants, &u, = 0, Suu, = Zand Suu, = 0, 
t # s. The estimate of a is 


(3.1) a= 7 tt.-1( dre. ry _ 


The process is stable if all the characteristic roots of a are less than 1 in absolute 
value; we shall consider in this section the case that all p characteristic roots 





682 T. W. ANDERSON 


are greater than 1 in absolute value. The methods for the scalar case can be used 
here, but the results are more complicated. A more general case would include 
matrices a with some roots less and some roots greater than 1 in absolute value, 
but this would be much more involved. 


Let 
(3.2) Ar = Doran — ad mati, 
(3.3) By = > t-1te-, 
(3.4) tr = a 7.5 
=u tate +++ ta Pury + ame, 
(3.5) z= > a uy + aro, 
(3.6) Pe = Sete ta tetera! +e ta era”, 
(3.7) Gi = tite + Gee” + ++ bee 
Then 
(3.8) & — a = Are. 
THEOREM 3.1. 
(3.9) plim (a Bra — Fy) = 0. 
THEOREM 3.2. 
(3.10) plim (Aga — Ge) = 0. 


These theorems are proved by methods similar to those used for Theorems 2.1 
and 2.2. The convergence in each case is also with probability 1. 

Suppose that a is a matrix such that there exists a nonsingular matrix ¢ such 
that 


(3.11) cac = , 


where } is a diagonal matrix with the characteristic roots of a as the diagonal 
ry —1 1 —} 1 ly - l lr 
elements. Thena = ¢ Ac,a =c A c,anda’ =c d‘c.letdX = vy. Then 


. ’ ise ’ 
Fy = 2rer7 +€C ‘yezperC ye + -:> 
(3.12) + oy” lcznzre'y” *e 


ly , T—-1 , T—1 
=c (czpere +--+ +y cererc'y )e 


1/ 
1/7 


The 7, jth element of the matrix in parentheses is the 7, jth element of czezc! 
multiplied by 


(3.13) b+ yivi ties + (viv) ti 


, 


. (aw)" 
l a i Yi 





STOCHASTIC DIFFERENCE EQUATIONS 683 


where vy; is the 7th diagonal element of y. This converges to 1/(1 — yvy;). Then 
the 7, jth element of cF'rc’ is asymptotically the 7, jth element of czz’c’ divided 
by 1 — yz;. Let T be the matrix with 1/(1 — y,y;) as the ¢, jth element, and 
let Zr be a diagonal matrix with 7th diagonal element equal to the ith element 


of czr. 
Coro.uary 3.1. 
(3.14) plim (Fr — ¢'Z7TZre’) = 0. 


T?2 
Now consider 
’ , ‘ —1/ , T—-1 —1/ 
Gr = Uper + Upyere’'yo + +++ + wmere’y Cc 
(3.15) 


1’ 


0 4 &, ee ae 
= (Urere’ + Upaere’y + +++ + Mere'y de 


The jth column of the matrix in parentheses is the jth element of 2c’ times 
(3.16) Up + yMra tees + 1j 5 


Let this be the jth element of a matrix Y,. 

CoRoLuary 3.2. 

(3.17) plim (Gr — YrZre") = 0. 
T?-2 

It should be noted that y and c do not need to be real, but ¢ 'Z;T'Zre’ and 
Y7rZzc " will be real. In fact the diagonal elements of Zr are the elements of 
czy , where c is complex and 2,7 consists of real random variables. The elements 
of Yr are complex linear combinations of real random variables. When we speak 
of Yr having a limiting distribution we mean the set of real random variables 
has a limiting distribution. (The coefficients of the linear combinations remain 
fixed. ) 

THEOREM 3.3. If the u’s are independently distributed and if Yr has a limiting 
distribution, then (Yr, Zr) has a limiting distribution, say the distribution of 
(Y, Z), and Y and Z are independent. 

TuHEoREM 3.4. If (Yr, Zr) has a limiting distribution, say the distribution of 
(Y, Z), then (Apa, a? Bra”) has a limiting distribution, the dis- 
tribution of (YZc’, c'ZT Ze"). 

TuHeoreM 3.5. If (Yr, Zr) has a limiting distribution, the distribution of (Y, Z), 
if the probability is 1 that each diagonal component of Z is different from 0, and if 
lr is nonsingular, then (& — a)a'”” has as a limiting distribution the distribution 
of YT"Z"'e. 

It may be noted that T is nonsingular if and only if the characteristic roots of 
a are all different. Since a diagonal component of Z is a linear combination of the 
components of z, all the diagonal components will be different from 0 with proba- 
bility 1 if the probability is 0 that the components of z satisfy a linear relation. 

THEOREM 3.6. If the u’s are independently normally distributed the limiting 
distribution of (Yr, Zr) is that of (Y, Z) where Y and Z are composed of linear 
combinations of two sets of independent normal variables. 





684 T. W. ANDERSON 


The mean of z is axp and the covariance matrix is 


(3.18) Sian te, 


The mean of Y is 0. The covariances are harder to describe. Let w be an arbitrary 
real p-component vector and let W be the diagonal matrix with the elements of 
cw as the diagonal elements. Then the covariance matrix of the ith and jth rows 
of YWe™ is oi¢ WIWe™’. 

We can give a kind of analogue of Theorem 2.8. In the scalar case, if the u’s 
are independently normally distributed, a‘ ”(& — a) and Bra *‘”~” have as a 
limiting distribution, the distribution of y/z and 2’; this limiting distribution 
has the property that the conditional distribution of y/z given z is normal with 
mean 0 and variance o’/z’. In the vector case if the u’s are independently nor- 
mally distributed and the characteristic roots of a are all different, (@ — a)a‘”” 
and a@ "Bra ‘*~*” have as a limiting distribution the distribution of Yl"Z‘c 
and ¢'ZVZec’; this has the property that the conditional distribution of 
Yr'Z'c given Z is normal with mean 0 and covariances = X 2(c ‘ZTZe 7 
This result can be used to justify the usual procedures of testing hypotheses and 
confidence intervals when the above conditions are satisfied. 

The mth order scalar difference equation can be treated by writing it as a 
special first order vector equation by letting the vector z, be made up of the 
scalars (%;, T:-1,°** , Te-m4i), and the mth order vector case can be treated 
similarly. 


4. Asymptotic distributions in the stable case. In this section we assume that 
the u’s are independently and identically distributed and that |a| < 1. Then 
we show that »/7 (& — a) has a limiting normal distribution. The important 
feature here is that the variance of u, is assumed finite, but nothing is assumed 
about moments of higher order. Diananda [1] proved a result similar to this 
when a = 0. 

TuEeoreM 4.1. The limiting distribution of Ar/+/T is nermal with mean 0 and 
variance o'/(1 — a). 

PROOF. 


- T T 
' T—2 t—1 

(4.1) Ar= a UpUr1 + @ > UpUe-g +++ $a Urls + X% a Ut. 

t==a2 t=3 t=1 
" . 22 27) / 2 . . 
The last term has mean 0 and variance xpo0° (1 — a )/(1 — a); this divided 
by T converges to 0, the random term converges stochastically to 0 and can be 
neglected. Let 


- 


ia * tI 
(4.2) Ar = Ar — ma Ut. 

1 
mn *. . ° ° > - 
Then A; is a linear combination of terms u,u,, ¢ # s. Each term has mean 
. : . 2 ee ia : : 
Suu, = 0 and variance &(u,u,)° = Su,u; = o. Each term is uncorrelated with 
each other term. 





STOCHASTIC DIFFERENCE EQUATIONS 


Let 


T 


7 T 
(4.3) Crs = 7 Ue Ui-1 + a ># Usp Ue_o -{- halls -f a’ 7 Uy Ui—_s-i 
2 3 


t=B+2 
for S < T — 2andletCr,5 = ArforS > T — 2. Then A? — Cr,s has mean 0 and 
variance bounded by 


4 2(8+1) 
ca 


io ae [T’ — (S -f- 2)). 


Then A*/+/T — Cr,s/+/T has mean 0 and a variance bounded (uniformly in 7’) 
by o'a°**"/(1 — a’). This can be made arbitrarily small by making S suffi- 
ciently large. Now let 


(4.4) 


T 
‘ * ps 
(4.5) Cris = 7. [tbe Wea + tte Uy_ea + es + aU; Ur_s-a). 


8+2 
The limiting distribution of C?,s/+/7 is the same as of Cr,s/+/7. Let 
(4.6) Ye = Uley + ayy 2 tees + ou y-s-1- 
Then 
“ “ am 
(4.7) Sy: 
(4.8) Ey Ys = t ¥ s, 


and y; is an (S + 1)-dependent sequence. Theorem 4.4 below applies, and hence 
Ct.s/*~/T has a limiting normal distribution with mean 0 and variance (4.7). 
Theorem 4.5 below completes the proof. 

THEOREM 4.2. 


(4.9) plim By/T = o/(1 — a’). 
T?2 


PROOF. 


T 


Br = Do aia = 20 + (um + at)? + (2 + a + aa)’ 


i 
+--+ + (Ura + are + +++ + a” 'x9)” 

(4.10) = [ui(lta’t ++» +a%X™™) + +++ + uz] 
+ Qfa( wer + +++ + Upaure) + ++ +a” uri] 
+ 2xolula + a? + +++ fa) +--+ +a" ur 
+ xofl + a? +--+» +a”), 


The last term divided by T converges to 0. The next to last term has mean 0 and 
variance bounded by a constant times 7’; when this term is divided by T it 
converges stochastically to 0. The second bracket has mean 0 and variance 





686 T. W. ANDERSON 


[a’(T — 2) +a'(T—3) +--+ +a Io" 
< Tol + a’ +--+] = To'/(1 — a’). 


This term divided by 7 converges stochastically to 0. Thus B,/T' has the prob- 
ability limit of the first bracket divided by 7. But 


(4.11) 


.. 2... ‘ ? ite ; 
7 dows — (uid +a + ees $a) + +e) + url 
=~ 1 
(4.12) =ula’ Uta’ t+---)+--> tupila tait+---) 
s a -1) nt % 
EG pees SP > 04 af: cals Siig 
l1—e@ 1-22 


This is a nonnegative random variable with expected value 


2 2(T—1) 
(4.13) ee sss ge Oe oe See, 
1 —o@ l—-@ 1l1-@¢ 
and divided by 7 converges to 0. Thus 
T—1 : 
/ Ut 2 
1 ae ; 
(4.14) plim = plim (j ST ~ 7 < oe 


by the law of large numbers. 

TueoreM 4.3. The limiting distribution of »/T(& — a) is normal with mean 0 
and variance 1 — a’. 

PRoor. 
(4.15) JT (a -—a) = YT a A/V T 

By B,/T 

This proof exploits the fact that the second-order moments of A,r involve 
only the second-order moments of the u’s (because A,r only involves products of 
independent u’s) and that a special central limit theorem applies. The result 
can easily be extended to the vector case, where the characteristic roots of the 
matrix a are less than | in absolute value. In turn this permits extension to the 
general-order difference equation (scalar or vector) in the stable case. The 
case of 


(4.16) Ye = aXe + yt uy 
again can be treated this way. However, the case of 


(4.17) Le = Oleg tee tu, 


where z; is a sequence of fixed variates, will not in general yield to this treatment 
(unless restrictions are made so that asymptotically z, washes out); the reason 
is that in addition to terms like u,u,_; there will be terms w,z,; and these will 
not be identically distributed. 





STOCHASTIC DIFFERENCE EQUATIONS 687 


The following central limit theorem was given by Diananda: 

THEOREM 4.4. Let yi, y2, +++ , be a sequence of random variables such that the 
distribution of (Yt+t,, Yt+tes *** » Yeet,) 28 independent of t for every  < 
le < +++ < th(t: 2 0) and n and such that this collection is independent 
Of (Yrs; » Yorers *** » Yore,) for every 8 < 8 < +--+ < 8 (8: 2 0) and pifs > 
t + ta + m. Assume Sy. = 0, yi < ©. Then DoT y:/+/T has a limiting normal 
distribution with mean 0 and variance 


(4.18) Byi + Wye +--+ + Wyrymis - 


The sequence y; is called m-dependent. The proof depends upon a theorem 
proved by Diananda, which is similar to the following? 
THEoreM 4.5. Let 


(4.19) Sr = Ziv + Xir ’ 


such that 
(4.20) &Xir = Mi, 
(4.21) lim M, = 0, 


kone 
(4,22) Pr{Zir S 2} = Fir(z) > F(z), as T- @, 
(4.23) lim F(z) F(z) 


kww 
at every continuity point. Then 
(4.24) lim Pr{ Sp S z} = F(z) 


T+-2 
at every continuity point of F(z). 
The condition on X;,r is essentially that it converge stochastically to 0 uni- 
formly in T. 


REFERENCES 
{1} P. H. Diananpa, ‘‘Some probability limit theorems with statistical applications,’’ 
Proc. Cambridge Philos. Soc., Vol. 49 (1953), pp. 239-246. 
{2} T. Koopmans, Ed., Statistical Inference in Dynamic Economic Models—Cowles Com- 
mission Monograph 10, John Wiley and Sons, New York, 1950. 
[3] Joun S. Wurre, ‘“‘The limiting distribution of the serial correlation coefficient in the 
explosive case,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 1188-1197. 


2 Theorem 4.4 and 4.5 were proved for the present paper before the author was aware of 
Diananda’s results. 





ON THE IDENTIFIABILITY PROBLEM FOR FUNCTIONS OF FINITE 
MARKOV CHAINS 


By Epaear J. Gripert! 


Sandia Corporation 


0. Summary. A stationary sequence {Y,:n = 1, 2, --- } of random variables 
with D values (states) is said to be a function of a finite Markov chain if there 
is an integer N = D, an N X N irreducible aperiodic Markov matrix M, a sta- 
tionary Markov chain {X,} with transition matrix M, and a function f such 
that Y, = f(X,). For any finite sequence s of states of {Y,}, let p(s) = 
P{(¥1,---, Ya) = 8}. For any state e, let set be the sequence s followed by 
¢ followed by the sequence ¢. For every state ¢, let n(e«) be the largest integer 
n such that there are finite sequences s; , --- , 8. ,4, --- , , such that the matrix 
|| p(sset;):1 S i,7 S n|| is nonsingular. 

If {Y,} is a function of a finite Markov chain, then }-n(e) < N. There is a 


finite set {s,,---, 8v,t,---, tw} of finite sequences such that p(s) satisfies 
the recurrence relations 
(1) p(set) = he a;(8)p(siet), 


where a,(s) either is zero for all s or else is a ratio of determinants involving 
only p(set) and p(s;et.) for f(j) = f(k) = f(t). 

If {Y,} has D states and is a function of a Markov chain having N states, 
then the entire distribution of { Y,,} is determined by the distribution of sequences 
of length $2(N — D + 1). For each N and D, a function of a Markov chain 
is exhibited which attains this bound. 

If there is a Markov chain {X,} with N = >n(e) states such that {Y,} is 
a function of {X,}, then { Y,} is said to be a regular function of a Markov chain. 
If {Y,} is a regular function of a Markov chain having transition matrix M, 
then M = X~* AX, where A isanN X N matrix withelements a,; = a;(s;f(i) )— 
defined by (1) above. X = || 2,;|] isa nonsingular N X N matrix such that 
xij = O unless f(z) = f(7), the first row of each nonzero submatrix along the 
diagonal consists of positive numbers, and > ja;; = p(sif(t)). Any N X N 
Markov matrix giving the same distribution for {Y,} can be written in this 
form, with the same A and with an X having the above properties. Any matrix 
of this form which has all elements nonnegative is a Markov matrix giving 
the same distribution for {Y,}. There are >-{n(e)}” — N “unidentifiable” param- 
eters in the matrix X, and at most N’ — > {n(e)}? “identifiable” parameters, 
determined by the distribution of {Y,}, in the matrix A. 


Received October 20, 1958. 

1 The major part of this work was completed while the author was a graduate student 
at the University of California, Berkeley, and was supported (in part) by funds provided 
under contract AF 41(657)-29 with the USAF School of Aviation Medicine, Randolph Field, 
Texas, and by the Office of Ordnance Research, U. 8. Army, under contract DA-04-200- 
ORD-171. 


688 





FUNCTIONS OF FINITE MARKOV CHAINS 689 


1. Introduction. Suppose a process is known to be a stationary irreducible 
aperiodic Markov chain with a finite number of states (for definitions and proper- 
ties of such chains, see [2]), but for some reason the states of the process can- 
not be directly observed. Suppose the states of the process are partitioned into 
groups, and that one can identify the group from which an observation came, 
but not which state in the group was observed. The observable process is again 
stationary, with a strictly positive stationary distribution, but it is not, in 
general, a Markov chain. A given Markov matrix, together with the function 
which partitions the states into groups, uniquely determines the distribution 
of the observable process. However, for a given function, there is in general 
more than one Markov chain which gives rise to the same observable proc- 
ess. For this reason, even if the entire distribution of the observable process is 
known, the matrix of transition probabilities for the original process cannot be 
uniquely determined. The general problem being considered here is the question: 
what characteristics of the observable process are needed in order to identify 
the class of Markov chains which could give rise to it? 

Functions of a finite Markov chain were studied from a different point of view 
by Harris [3] under the name “grouped Markov chains.”’ For the case of a Markov 
matrix having all elements positive, he obtained an expression for the conditional 
distributions, P}Y, = €| Yn = u,-+-: , Yn = Us, of the observable process, 
in terms of a finite set of continuous distributions on [0, 1] whose generating 
functions are determined by the originating Markov matrix. He did not, how- 
ever, study the identifiability problem. Blackwell and Koopmans [1] showed 
that for any function of a finite Markov chain, there is a finite integer J such that 
the entire distribution of the function process is determined by the distribution 
of observable sequences of length not exceeding J, and ‘obtained, for a Markov 
chain with N states, an upper bound of 2N’ + 1 for J. They also considered, 
and “almost” solved, the identifiability problem for two special cases: (a) the 
N states are grouped (1, N — 1), and (b) N = 4 and the grouping is (2, 2). 
The methods used in this paper are extensions of the method used for (b) by 
Blackwell and Koopmans. 

Before proceeding with the investigation, it is necessary to develop some 
notation which will be used throughout what follows. {Y,:n = 1, 2, ---} will 
always be a stationary (irreducible aperiodic) sequence of random variables 
with a finite number, D, of states, which we assume are the integers 0, 1, --- , 
D — 1. All elements of the stationary distribution for {Y,} are assumed to be 
positive. States of { Y,} will be denoted by Greek letters ¢ and yu, with or without 
subscripts, and letters s and ¢t, with or without subscripts, will stand for finite 
sequences of states of { Y,}. The sequence “‘s followed by ¢’”’ will be written ‘‘s?.”’ 
We will have occasion to refer also to the empty sequence, @, and @s and s@ 
will both represent the sequence s. 

M = |\m,;\| will be an N X N irreducible aperiodic Markov matrix, and 
{m;:1 S i S N} will be the (unique) stationary distribution associated with M, 
all m; > 0. {X,:n = 1, 2, --- } will be a stationary Markov chain with M as 
matrix of transition probabilities. Let f be a function on {1, 2, --- , N} to 





690 EDGAR J. GILBERT 


{0, 1,---, D — 1} and let N, = the number of states in f‘(e). Let K. = 
No +--+: + Neui(Ko = 0). For notational convenience, we assume that f is 
nondecreasing; i.e., f '(e) = {Ke + 1,---, Ke + NG. 

For any finite sequence s, and any state 7 of { X,}, define 


(2) p(s) = P{(Y,,---, Y.) = 3, 
(3) qi(s) = *(¥i, Te va) = 8, Xa+i = 1}, 
(4) ris) = P{(Y2,-++, Yeu) =#!X =H. 


It will be useful also to define, for any 7, ¢i(@) = m;, andr,(@) = 1. Then 
for any s, t, e, and uw (including s or t = @), it is evident that 


(5) p(sel) = 2 qi(s)ri(t) 
S(i)=—e 

and 

(6) p(seut) = >. z qi(s)m,;r;(t). 


S(ip=e f(j)—e 


These two equations will be basic in all that follows, and we shall take ad- 
vantage of the notational simplification possible by restating them in the form 
of matrix equations. For any set of sequences 8; , s:,--- , 8s, and any e«, let 
Q.(s1,°+* , &) be the n & N, matrix whose (7, 7)th element is g«,+;(s;). Let 
R.( 8, +++ , &) be the N. X n matrix whose (7, 7)th element is re,.;(s;). The 
function f induces a partition of M into submatrices M,,, where the (7, 7)th 
element of M,, is mx,+i,x,+;. Finally, let P.(s,--- , 8; 4, +--+, tm) be the 
n X m matrix whose (7, 7)th element is p(s,;et;). Then (5) and (6) become 


(7) Ps, ee > Sn it 3 _— tm) ad Q.(s —.* » on) RAE, 7s » bm) 
and 


) Ps, ee ae ; mh, ee » Ulm) = Q.(8 , is , 8n)M.,R,(h , ~~ © tm) 
(8 


om Pal ié, *** , Gabe a, °° * > bm)s 


A Markov chain is characterized by the property that the conditional prob- 
ability of the sequence seu, given se, is independent of s. In terms of the functions 
p(s), this is p(seu)/p(se) = pleu)/p(e). In fact, for any sequences s,, ¢,, 
So, te, we have 

| p(s el; )p( 81 elz ) 


: = 0. 
| p\ Se el; pl 82 ele ) 


In still other words, the largest square matrix of the form || p(s,et;) || which is 
nonsingular is one by one. It is this property which we shall generalize to func- 
tions of a Markov chain. 

For each state ¢ of a stationary sequence {Y,} of random variables, let n(«) 
be the largest integer n such that there are finite sequences 8, , --- S,, ti, °°: , th 





FUNCTIONS OF FINITE MARKOV CHAINS 691 


of states of {Y,} such that the matrix || p(siet;):1 S 7, 7 S n|| is nonsingular. 
(If no such largest integer exists, let n(e) = ~.) 
Lemma 1. If {Y,} is a function of a finite Markov chain, then }on(e) S N. 
Proor. By Equation (7), for any set {s; ; 4:1 Ss 7 Ss N. + 1} of finite se- 
quences of states of { Y,}, 


P38; 5 tj:1 = i,j Ss N, > 1) 


(9) e 
= Q(s1 Sis N+ I)RAt:1 $5 SN.+ 1), 


and each of the matrices on the right-hand side has rank at most N, ; so the 
product is singular. Thus n(¢) cannot be larger than N,. Therefore 


ne) S DLN. = N. 


It ic an interesting conjecture that }>n(e) < © is a necessary and sufficient 
condition that a stationary (irreducible aperiodic) sequence be a function of a 
finite Markov chain. We shall later see evidence which seems to support this 
conjecture, but the writer has not been able to complete a proof (or disproof ) 
of it. 


2. Regular Functions. The set of all N xX N irreducible aperiodic Markov 
matrices may be thought of as a subset of Euclidean N(N — 1) dimensional 
space. For a given function f, the set for which }\n(e) < N isa set of dimension 
less than N(N — 1), having Lebesgue measure zero in the set of all N xX N 
Markov matrices. In this sense, the case where >-n(e) = N is the most im- 
portant case to investigate. For this reason, and for others to be mentioned 
later, we shall say that {Y,} is a regular function of a Markov chain if there is 
a representation of { Y,} as a function of a finite Markov chain having N states, 
and > n(e) = N. In the remainder of this section, unless otherwise stated, 
we shall assume that {Y,} is a regular function of a Markov chain. Some of 
the results of this section are true also for the case )-n(e) < N, and these will 
be pointed out in the next section. 

Let s:,--:, Sw, tt,°+:, tw be a set of sequences such that for each 
e, Psi; ty:f(i) = f(j) = ©) is nonsingular. Then for each ¢, the rows 
of Q.(s;:f(i) = ¢€) form a basis for Euclidean N,-space. In order to obtain a 
basis which is associated with sequences of minimum length, it is convenient 
to order the set of finite sequences (including @, considered as a sequence of 
length zero) in such a way that s follows ¢ if length (s) > length (¢), and (say) 
numerical order for sequences of the same length. Then we may start with Q,( @ ) 
and proceed to consider each sequence in order until we have found a basis. In 
this manner we obtain a set {s::1 < i < N} with the property that for every s, 


(10) Q(s) = >> as)Q.(s;), 


S(s)=e 


where a;(s) = 0 if length (s) < length (s?), and no set {s;| satisfying (10) has 


. , 0 . * 
maximum length (s;) < maximum length (s; 


P 
. . . . * ° y . 
A similar procedure will obtain a set {t;:1 < 7 S N} such that for every /, 





692 EDGAR J. GILBERT 


(11) R(t) = > bi(t)RAt5), 
S(j)=e 

with corresponding properties for the b;’s and ¢}’s. We shall assume from now 
on that the sequences have been chosen in the first place so that s; = 8, t= t} : 
and shall drop the asterisk in the notation. 

LemMa 2. /f {| Y,} has D states and is a regular function of a Markov chain with 
N states, and if {s;} and {t;} are the sets of sequences such that || p(s,et;) || is non- 
singular, chosen as in (10) and (11), then maximum length (s;) S N — D and 
maximum length (t;) = N — D. 

Proor: Let m be any integer. Suppose for all s such that length (s) < m, 
and all e, that Q.(s) = >> a,(8)Q.(s;), where length (s;) < m. Then for all s 
with length (s) < m and all ug, 


Q,(se) = Q(s)M. = Doai(s)Q.(8:)Mu = Dai(s)Q,(si€). 
Now length (s;e) S m; so Q,(se) = dar (si€e)Q,( 8), and therefore 


Q, (se) = > doai(s)ax(8€)Q, (se) = > ax(se)Q,(s). 


That is, for all s such that length (s) S m + 1, andalle.,Q.(s) = >a, 8)Q.(s;), 
where length (s;) < m. Then by induction we obtain that the result holds for 
all s and all e, with the maximum length of the sequences s; being less than m. 

Therefore in the set {s;:1 S 7 S N} there must be at least one sequence of 
each length up to the maximum length. For if any length were skipped, then 
so would be all following. Since there are N — D sequences not @ in the set, 
the maximum length for s; is not greater than N — D. A similar argument holds 
for sequences ¢; . 

THeoreM |. /f { Y,} has D states and is a regular function of a Markov chain 
having N = > n( €) states, then the entire distribution of | Y,} is determined by 
the set of functions {p(s): length (s) S$ 2(N — D+ 1)}. 

Proor. Multiplying both sides of equation (10) by R,.(t), we obtain for every 
8, e, and ¢, 

(12) p(set) = ye a,(s)p(s;et). 

(+) =e 
Setting ¢ successively equal to ¢; for each j such that f(j7) = €, we get a set of 
N. independent linear equations in the N, functions a,;(s). By Cramer’s rule, 
we can solve the system of equations, obtaining a;(s) as a ratio of determinants 
which involve only p(sjet,) and p(set,) for f(j) = f(k) = f(t) = «. 

Let J = maximum length (s,f(¢)f(j)t;). Then (12) expresses p(s) for all s 
of length greater than J in terms of the probabilities of sequences of length 
< length (s), and by repeated use, in terms of the probabilities of sequences of 
length < /. But by lemma 2, J S 2(N — D+ 1). This completes the proof. 

We may obtain the same result, and another recurrence relation, by multi- 
plying both sides of (11) by Q.(s), to get 


(13) p(s) = >. b,(t)p(set;). 


f(j)=e 





FUNCTIONS OF FINITE MARKOV CHAINS 693 


The calculations in Lemma 2 indicate that a process which achieves the upper 
bound might be obtained by choosing one sequence s; of each length from 1 to 
N — D, and similarly for t;. By choosing a function which is one to one on 
D — 1 states and which groups the remaining N — D + 1 states together, we 
obtain for each N and D a pair (f, M) which attains the upper bound. 


let f(i) =O 1 StsN-D+1 
f(t) =i-(N-D+1)fN-D+2s8 
i oO << i < Ge es ee ee ee 
Let m;; = Oif1 Si,j7 SN —-D+1andi #y, 
m; =a if1si,j = N — D+ 1andi = j, 
mi; = (1—a;)/(D-1)fN-D+1<igNandl SjsN- 
D+1, 
=(l-—a,)(D-1)ifflsisN-—-D+landN-D+1<jsN, 


_°AD—-1)-N+at+at 
(D — 1) 


s++ + Ay_p4t ifN—-D +1l< t, 


i323 N. 

It is easily verified that M = || m;;\| isa doubly stochastic Markov matrix. 
If {X,,} is a stationary Markov chain having transition matrix M, then {f(X,)} 
is a function of a Markov chain for which we may choose s; = 4; = @, & = bk = 0, 

» Sa—p41 = tn_p41 = a sequence of N — D 0’s, sy-p = 

Sy = ly = 2D. 
Then if Po = Po(s,°--, 8v—-v41, 4, °°*, tw—p41), then 
Pol= NP?" TT (a - a;)f’. 
Isi<jsN—D+2 

Since all a; are distinct, | Py| # 0, and therefore the distribution of {f(X,,.)} is 
not determined unless the probability of a sequence of 2(N — D + 1) 0’s is 
known. 

We shall see in Section 3 that Theorem 1 is true in general for a function of a 
finite Markov chain, with N replaced by }-n(e). At this point, then, we have a 
partial answer to our question. The class of Markov matrices which could gen- 
erate the observable process is determined by the set of functions p(s) for s 
having length s 2(N — D + 1). More precisely, following Blackwell and 
Koopmans [1], let us say that a finite set S of functions p; defined on the set of 
N X N irreducible aperiodic Markov matrices is a complete set of invariants 
relative to a function f if and only if p;(M,) = p;(M,) for all p; in S when and 
only when M, and M, give the same distribution for {f( X,,)}. A complete set of 
invariants is said to be minimal if no proper subset is complete. Then the result 
of Theorem 1 (as extended in Section 3) is that the set of functions 





694 EDGAR J. GILBERT 


{p(s):length (s) < 2(N — D + 1)} is a complete set of invariants relative to 
any function taking N states into D states. It is not a minimal complete set 
relative to any particular function, as some of the probabilities listed are deter- 
mined by others. 

However, if for some e, n(e) < N,, there may be a pair (f’, M’) such that 
M' isa > n( ax yn e) Markov matrix, and {Y,} has the same distribution 
as {f’( Zz. )}. Then a complete set of invariants relative to f’ could be found which 
contained fewer functions than any complete set relative to f. In this case, 
(f’, M’) would seem to be a more natural representation than (f, M). This fur- 
nishes a second reason for looking at regular functions of a Markov chain. 

An example used in another connection by Blackwell and Koopmans furnishes 
a good illustration. Let 


403 — 2 
= a ig 
M =| $y} — y 

$23 — 2 


and f(1) = 0, f(2) = f(3) = 1. {p(1), p11), pill), p(1111)} is a minimal 
complete set of invariants relative to f. However, for the particular matrix VM, 
{Y,} is a Markov chain, and its distribution is determined by {p(1), p(11)}, 


which is a minimal complete set relative to f’:f’(1) = 0, f’(2) = 1, with 
2 3 
, —_— 
M =|; ; 
2 2 


being the only 2 X 2 matrix which gives the proper distribution for { Y,}. The 
parameters x, y, and z are all unidentifiable by observation of the process { Y,}. 

Next we shall obtain a parametric representation of the equivalence class of 
all N X N Markov matrices which give the same distribution for a given regular 
function of a Markov chain. In the process, we shall need to look more closely 
at individual sequences s; and ¢; and functions a; than our present notation 
conveniently allows. So if z is any of these symbols, let z.; = zx,+; ; that is, 
2.; is the 7th z associated with the state e. Also, if W = || w;; || isanyN X N matrix, 
let W,, be the n(e) X n(u) submatrix for which f(7) = ¢, f(j) = ux. 

Let A be the N X N matrix whose (7, j)th element is a;(s,f(i)), where a;(s) 
is the function defined by (12). Then as a consequence of (12), for every e, 
wu, and tf, 


14 Pl Sy1 » *** 9 Sunt) 5 ed) = P.( Sym, ee 5 Suntu)h; t) 
(14) 


= Bode 9 °° * » Senle) - £2: 


By induction, then, if t = ee --+ ene, 


(15) P,( 8,1 9 °** » Sunt) 5 t) = Aine: Avics or Age (8a 9 °° * » Sente) 5 @ Ee 


Using these facts, we may now prove 
THEOREM 2. Let | Y,,} be a regular function of a Markov chain {X,} with N K N 
transition matrix M. Let A be the N X N matrix whose (1, j)th element is a;(8,;f(7) ). 





FUNCTIONS OF FINITE MARKOV CHAINS 695 


Then there is a nonsingular N X N matrix X such that (i) x;; = 0 unless f(i) = 
S(j), (ii) the first row of each X . consists of strictly positive numbers, (iii) >. jai; = 
p(sf(i)), and M = X*AX. Any matrix equivalent to M can be written in 
this form with the same A, and for any nonsingular X satisfying (i), (ii), and 
(iii) which makes all elements of X”'AX nonnegative, X-'AX is a Markov 
matrix equivalent to M. 

Proor. Let P, Q, and R be the N X N matrices whose (7, 7)th elements are 
respectively p(s,et;), q;(s8;), and r,(t;) if f(¢) = f(j) = «, and zero if f(i) ¥ f(j). 
Let C be the N X N matrix whose (7, j)th element is p(s,;f(7)f(j)t;). Then by 
(14) C = AP, and by (8), C = QMR. So M = Q'APR™ = Q"'AQ. Thus Q 
satisfies the requirements on X. 

Any matrix M’ equivalent to M defines a matrix Q’ which satisfies the re- 
quirements in the same fashion. Since A is completely determined by the dis- 
tribution of {Y,}, it is not changed by substituting for M a matrix 
equivalent to M. 

Now let X be a nonsingular N X N matrix satisfying (i), (ii) and (iii) such 
that all elements of X-'AX are nonnegative. Define M = || m,,;|| = X ‘AX. 
We wish to show that M is a Markov matrix with a unique all-positive stationary 
distribution, and that (f, M) generates the process { Y,}. 


N D—1 
> m; = > 8 za mis 7;)(D), 
j=1 p= f(j)—e 


and by (iii), 


MaRS) = VXVAGX RD) = XT VAP (8a, +++ 5 Sunn 3D) 


“ 
= XUP8a 9 °°* » Sen(e) 5 DB) = RAD Pe 


Therefore >>; m;; = 1 for every i; so M is a Markov matrix. 
We next show that the collection of positive numbers in the first row of each 
> » ; ‘ ° . , ; s 
of the X,, form a stationary distribution for M. Let Q.(@) = {first row of X,.}. 
Now 


Q(D)My = QUD)XLALX 
= {1,0,---, O}A, Xn 


= fay(e), --* , Any (€)} Xup - 


lifj =1 
> a,j(€) i 


if jf ¥ 1. 


Therefore 


> Q()M., - {1, 0, 9 


= Q,(Z). 





696 EDGAR J. GILBERT 


Now by (ii), every element of Qi(D) is positive; so this is indeed the stationary 
distribution associated with M. 

Finally, we show that if {X,} is a stationary Markov chain with transition 
matrix M, then {f(X,)} has the same distribution as { Y,}. Let s = pee --- €n€ 
be any finite sequence of integers from {0, 1, --- , D — 1}. Then 


Prob{ (f(X1), --- , f(Xn42)) s} Qh( ZS) M yeM eres --» M, RAD) 
= Qi D)XwApeAne +: AgeXek(D) 


Il 
Il 


= OS MaMa e, *:* Aa Phoa, *** 
Bence) 3 D) 

te eee | tT ee ee. 
€1€2 *** €n€) 

= P\ueiés *** En€). 


This completes the proof. 

The “identifiable” parameters of M are contained in the matrix A, and the 
free entries in X are “‘unidentifiable,”’ since they may be changed without chang- 
ing the distribution of {Y,}. Since the only real restrictions on elements of X,. 
are that each row have the proper sum, there are > n(e){n(e) — 1} = 
do {n(e)}? — N “unidentifiable” parameters associated with M. In general, 
there are N(N — 1) free parameters in a Markov matrix; so there are in general 
N — > {n(e)}? “identifiable” parameters associated with M. 

Since the distribution of { Y,} determines and is determined by the matrix A, 
any representation of {Y,} by (f’, M’), where M’ is larger than M, would have 
the same number of “identifiable” parameters and would simply include more 
“unidentifiable” parameters. Also no representation (f”, M”) with M” smaller 
than M is possible. So the representation (f, X-'AX) would seem to be a com- 
plete solution of the identifiability problem for regular functions of a Markov 
chain. 


3. General Case and Unsolved Problems. If {Y,} is a function of a finite 
Markov chain and >>n(e) < N, it is still possible that a representation can be 
found for {Y,} as a function of a Markov chain having > n(e) states. In this 
case, all the results of Section 2 apply. The special case still remaining, that 
{Y,} is a function of a finite Markov chain, but no representation can be given 
as a function of a chain having > n(e) states, may be empty. At this time it is 
still an open question whether or not every function of a finite Markov chain 
is a regular function of a Markov chain. However, even if the case is not empty, 
a modified version of Theorem 1 still holds. The computations in proof of 
Theorem 2 prove the following: 

LemMa 3. /f {Y,} ts a stationary sequence of random variables with 
values 0,1, --- , D — 1, with p(e) > O for each «, and with >-n(e) = N’ < a, 





FUNCTIONS OF FINITE MARKOV CHAINS 697 


and if f’ is the function carrying the first n(Q) integers into 0, next n(1) integers 
into 1, etc., then there is an N’ X N’ matrix M’ of real numbers, and a 
set {m;:1 St S N’} of positive real numbers such that 


(i) Dm; = jm; = 1, 


ee / , , 
(il) i mm; = m;, and 


(ill) for any sequence 8 = €€ +++ €n, 


= DY {mi,mi, is vee mi, sinif’ (tx) =e,lsksn}. 


The fact that p(s) satisfies a recurrence relation of the type (1) follows from 
the fact that for every «, some determinant | P.(84, --- , Sencey j tary °° , bencey)| 
is nonzero, while for every s and t, |P.(8, 8, «++ , Sence) 3 t, bea, ++ 5 tencey)| = O. 
Then the matrix A defined by this recurrence relation, together with any X 
satisfying (i), (ii), and (iii) of Theorem 2 generates a matrix M’. The question 
of whether or not there is in this class of ‘“‘pseudo-Markov” matrices one matrix 
with all elements nonnegative is the question of whether or not >> n(e) < @ 
characterizes a regular function of a Markov chain. If n(0) = 2, n(e) = 1 for 
0 < « < D, then it can be shown that there is indeed such a nonnegative matrix. 
However, the writer has not yet been able to extend this result to the general 

case. 


But for any M’ havi ing the properties of Lemma 3, we may define recursively, 
with r:(@) = |, wD) = m;, 


(et) = > {mi ri(t t):f'(j) = d, 
(se) = > {¢i(s)mji:f'(j) = 4, 


and note that for every s, e, and ¢, 


p(s) = a qi(s)ri(t). 
f'(i)=—e 
All of the computations carried out in Lemma 2 and Theorem | go through for 
the functions q’ and r’, and Theorem 1 remains true with f replaced by f’ and 


N by > ne 


4. Acknowledgement. The author wishes to express his deep appreciation to 
Professor David Blackwell for providing the idea that started this work and 
for his guidance and encouragement while it was being done. 


REFERENCES 
{1} Davip BLackwELL ANp L. Koopmans, ‘On the identifiability problem for functions 
of finite Markov chains,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 1011-1015. 
{2} Witu1aAM Feuer, An Introduction to Probability Theory and its Applications, Vol. I 
(second edition), John Wiley & Sons, New York, 1957. 
(3) T. E. Harris, ‘On Chains of Infinite Order,’’ Pacific J. Math., Vol. 5 (1955), pp. 707-724. 





IMBEDDED MARKOV CHAIN ANALYSIS OF A WAITING-LINE 
PROCESS IN CONTINUOUS TIME 


Donatp P. Gaver, JR. 


Westinghouse Research Laboratories 


1. Summary. Bunches of individual customers approach a single servicing 
facility according to a stationary compound Poisson process. The resulting 
waiting line process is studied in continuous time by the method of the imbedded 
Markov chain, cf. Kendall [7], [8], and of renewal theory, cf. Blackwell [3], 
Feller [5], and Smith [9]. Busy period phenomena are discussed, cf. Theorem 1, 
in which the transform of the joint d.f. of busy period duration and the number 
of departures in that duration is expressed as the root 2;(s, z) of a functional 
equation, a generalization of a result of Takdcs [12]. In terms of x;(s, z) “‘zero- 
avoiding” transition probabilities are characterized. A simple model for “‘in- 
stantaneous defection” is analyzed. Using renewal theory, ergodic properties 
of waiting line lengths and waiting times are discussed for the ‘‘general’’ process, 
jn which idle and busy periods recur. 


2. Mathematical formulation. In this section arrivals to, and departures from, 
the system (single servicing facility plus waiting line) are characterized, and 
basic random variables and probabilities associated with the resulting waiting- 
line process are described. 

For clarity the following terminology will be used: ‘time’ will refer to an 
“instant”’, specified by a real parameter ((0 S t < «), and is measured from 
some initial instant taken as origin; the time axis will also be the range space of 
certain random variables; “‘period’”’ will mean a time interval, such as (4, , ts). 
Deviations from common terminology, as in the case of “‘service times’’, will be 
pointed out when they occur. 

Arrivals. Arrivals at the system occur in accordance with a stationary com- 
pound Poisson process. Such a process can be described in terms of the following 
random variables: 


(a) {A,} is a sequence of positive independent random variables, where 
‘ r 
(2.1) PrfA, S zx] = 1—e™* (A>0); 
A can be interpreted as the period that elapses between the times of arrival 
of two successive bunches of customers. 
(b) {a,} is a sequence of random variables such that 


k 
(2.2) a, = >A; 


t=1 


a, can be interpreted as the random time of arrival of the kth bunch of customers. 


Received March 3, 1958; revised March 26, 1959. 


698 





MARKOV CHAIN ANALYSIS 699 


(c) {B,} is a sequence of positive random variables, mutually independent 
and independent of {A;}, where 


(2.3) Pr[{B, = J\ = b; Gj = a 2; 3, eee ); 
B, represents the number of customers in the bunch that arrives at time a, . 


For a fixed value of the real parameter 4(0 S t < ~) let 


(2.4) A(0,t) = > B,. 


O<aj<t 


A(0, t) is the total number of customers that arrive in the time interval (0, ¢). 
We have from (2.1), (2.2), and (2.3), 


Pr [A(0,t) = kl = a(t) = >> e™ xy br" 


n=0 


where * denotes the convolution operation. The generating function (G.F.) 
of {a,(t)} is 


(2.6) a(z,t) = > za: (t) = exp[—Adjl — b(2)}]. 
k=O 
Thus the distribution {a,(t)} is infinitely divisible, cf. Doob [4]. Note that the 
numbers of arrivals in any non-overlapping time intervals are independent ran- 
dom variables, each with distribution (2.5), ¢ being the length of an interval. 
It will be convenient for later developments te '=t {8,} be a sequence such that 


BB, =0 (n = 1,2,---,#) 
(2.7) By ay (n=i+1,i+2, ---,¢+ B,) 
Ba = Ors (n=itB+1Lit B+2,---,¢+ Bet Bera) 
Thus 8, is the time of arrival of the (bunch containing the) nth customer to 
arrive; 8, = 0 is the time of “arrival” of the < 2 0 customers present at the 
initial instant. We shall assume that the customers in a bunch are assigned 
numbers in the ranges given by (2.7), and that they receive service in the order 
specified by those numbers. 
Departures. Single customers depart from the system, with service completed, 
at random times. The departure process can be described in terms of 


(d) {S,} (n = 1, 2, 3, ---), a sequence of mutually independent random 
variables with 
(2.8) Pr{S, S z] = U(z), 


where U(x) is a d.f. with U(O) = 0. 
Put 


2 


(2.9) u(s) | e** dU(2x) 
0 


for the Laplace-Stieltjes transform (L.8S.T.) of U(2z). 





700 DONALD P. GAVER, JR. 


The random variables of the sequence {S,} are independent of those of {A;} 
and {B,}. Associate S, with 8, , i.e. S, is to be interpreted as the service-time 
(service period) of the nth customer to arrive, and consequently of the nth 
customer to be served. 


Definition (Departure Times): Let the sequence {7,,}(n = 1, 2,3, --- ) where 
To => lo = 0 


and 
(2.10) _ Sr + max(8, , T.-1) 


be called the sequence of departure times from the system. T', is the time of de- 
parture of the nth customer to receive service after time ty . 

If, at the initial instant, 7 = O in (2.7), then T; = S, + 8,. This fol- 
lows from well-known properties of the exponential d.f. If initially i > 0, 
and service of the first customer is just commencing, then 7, = S, ; if service 
of the first customer has proceeded for a time 2’, then the df. of S; is 
(U(x) — (U(2’))/1 — U(2’)). 

Definition (Number of Customers in the System): The random variable N( 77), 
(hereafter denoted by N(T,,)), defined as 


(2.11) N(TT) =i+A(t,T,) —n, where 7°) = t) = Oand N(T>) = 2, 


is the number of customers in the system at the time of the nth departure (service 
completion) after time t). Initially To = t, and N(T.) = i(t = 0, 1,--- ). 
For a fixed time, say to + ¢, the random variable 
N(t +t) = i+ A(, 7T,) + A(T,,t) —n 
(2.12) 
if neinjio <T, Stott < Tri} 


is the number of customers in the system at time tp + t. We have 


N(t + t) = N(T,) + A(T, t) 
(2.13) is ; 
if ne}n to 7 Ye _ to +  < Za4al: 
clearly N(7T,) and N(t) are non-negative integer-valued random variables, 
and N(t) is continuous on the right. 


Definition (Number of Departures). The random variable M(t) defined by 


(2.14) M( lo + th=n if lo < 7. < ty + t< Tuas ’ where To = to > 


is the number of customers who have departed (received service) in the period 
(to, to + t). 

Definition (Idle Period): An idle period is a time interval (7, , Bn41), where 
Bnsi > T, . The length of an idle period is the random variable 7; = Bri; — Tp. 
Let tf) = 0 in (2.11). Then since 8, < T,, from (2.10), N(t) > O for t such that 
(B, = t < T,) and N(t) = Ofor (T, S t < Bn41), ie. there are no customers 
in the system during an idle period, and it is preceded (and followed) by periods 





MARKOV CHAIN ANALYSIS 701 


during which customers are always present, or “busy periods” (definition next). 
From familiar properties of the exponential d.f. and from (2.1), we have 


Pr[r, S x] = I(x) = 1 —e™. 


If (T,, Baar) and (T'», Bmsi)(m # m) are any two idle periods, their lengths 
are independent random variables, each with d.f. I(x). 

Definition (Busy Period): A busy period is a time interval (8, , Tn4-) (r = 
0, 1, 2, --- ) such that 


(2.151) (i) h. > Tons 
(ii) For each 7; satisfying (8, < 7; S Tn4,) we have 
(2.15ii) T;, = 8+ Tin 
(2.15i11) (ili) Patetr = Suter + Bntrtt > Snteti + Tse 
The length of a busy period is the random variable 
t = Tair — Bn. 


Put t& = 0 in (2.11) and let the conditions (2.15) hold. Then N(t) = 0 for t 
satisfying (7,1 S t < 8,), and N(t) > O for ¢ satisfying (8, S t < T,4,). 
That is, customers are always present in the system during a busy period, and 
busy periods are preceded and followed by idle periods. It will be shown later 
that, under some circumstances, busy periods are prolonged indefinitely with 
positive probability. 

Suppose (8,, Tn+4,) and (Bm, Bmig)(m > nm + 1) are any two (non-over- 
lapping) busy periods. Then it follows from the assumed arrival and departure 
processes that their lengths are independent random variables. 

Definition. (State of the System): The pair of random variables [N(T,,), 7',] 
will be called the state of the system at the time of the nth departure. In words, the 
state of the system is the number of customers left behind by the nth departing 
customer, and the time at which this departure takes place. 

Remark: {{N(T,), Tn]} forms an imbedded Markov chain. {[N(T7',), T,]} is 
essentially the imbedded Markov chain of Kendall [7], [8], but is a somewhat 
more comprehensive description of the system. Referring to (2.5), (2.8), and 
(2.11), and recalling the independence of arrivals and departures, the one-step 
transition probabilities for the chain are, when the initial state 7 > 0, 


(2.16) P(t) = Pi(t) = Pr{iN(Tra) =i th, Ta So +t! N(T,) 


= 2, 7 = to] 


: 
, Ty , — 1, 2,3, °*- 
= [ Qni(t’) dU(t), +08 iy ea 


These transition probabilities are stationary; they do not apply when 
N(T,) = 0, for when this event occurs the system remains idle until a new 





702 DONALD P. GAVER, JR. 


bunch of customers arrives and service of the n + Ist customer can begin (see 
(2.15)). We shall therefore first (Section 3) study the chain during a busy 
period, obtaining an expression for the n-step zero-avoiding transition probabilities, 


Pi? (t) = Pr(N(T,) =3, Tn Sto +t, N(Tr) > O(k 
-N( To) = ‘. To = to| 


a se) 


(2.18) 
= PriN(T,) =j7,T,a Stitt, N(U) >O0(m st < T,) | 


-N(T>) = ‘. To = to] 


In words, P{j’(t) is the probability that the number of customers in the system 
passes from 7 > 0 at time & to 7 > 0 immediately after the nth departure, the 
latter occurring before time f + ¢, without having passed to zero in the meantime. 
Equivalently, the transition occurs “‘during a busy period’’. We call the process 
whose transition probabilities are (2.18) the busy-period process. An expression 
for the probabilities (2.18) are derived in Section 3. Making use of this, an ex- 
pression for the joint probability distribution of N(t + &) and M(t + ¢) is 
obtained. An explicit expression for the d.f. of the duration of a busy period 
results as a by-product. 

In general, transition from N(f&) = i > 0 to N(& + t) = j > O can occur 
with N(t’) = 0, where ?¢’ satisfies (& < t’ < ¢), i.e. transitions may occur with 
intermediate passages to zero. We call the process that permits such transitions 
the general process, and discuss it further in Section 5 and 6, using methods of 
renewal theory. In Section 7 the d.f. of waiting times (waiting period durations, 
in our terminology) in the general process is discussed. 


3. The busy period. In (2.18) %& can be interpreted as the instant at which 
a busy period commences, and 7’) = t as the time at which service of the first 
customer to receive service during the period begins. Because the transition 
probabilities (2.17) and (2.18) are stationary, we shall consider ¢ to be time 
measured from ft in this section. Since U(0) = 0 


(3.1) Pi? (0O-—) =0 (n = 0) 
and 
(3.2) P§}(t) = 6;;Uo(t), 


the Kronecker delta multiplied by the unit step at ¢ = 0. Prescription of other 
initial conditions is straightforward. 

To derive the transition probabilities (2.18) observe that N(T,4:) = 7 > 1 
if N(T,) =j — hand exactly h + 1 customers arrive at the system in (7, , T'n4:). 
Thus by direct enumeration the P{}’(t) satisfy the forward Chapman- 
Kolmogorov equations 


j-l t 
(3.3) PYM) = > [ PY? A(t — anu (t’) dU(t’) 
0 


h=—1 





MARKOV CHAIN ANALYSIS 


j-1 
(3.4) = 2 P\%4(t)*Pi(t). 
=—1 


To obtain a formal solution to (3.4), first introduce the Laplace-Stieltjes 
transforms (L.S.T.) 


p7”(s) [ et aP(t) 


pn(s) I e" dP,(t) 
converging at least for s > 0, and the generating function 
(3.6) Gi;(z;8) = > 2"pi7?(s) 
n=0 


the latter converging at least for |z| < 1. Then (3.4) becomes, using familiar 
properties of the L.S.T. and G.F., 


j-l 
(3.7) Gij(z; 8) — bi; = 2 >, Gi.j-a(z; 8) pals). 
ha—1 


Next introduce the generating function 


co 
(3.8) gi(Z, 2738) = > 2°G;,;(z; 8) 
j=l 
again assumed to converge at least for | z | < 1. After some simple manipulations 
we have 


(3.9) gi(z,2;8) = f- nan 
— en(s, 


where the function r(s, x) is the generating function of the transforms of the 
one-step probabilities, 


x 


r(s,t) = >> 2'p(s) = : I exp [— st — At{1 — b(x)}] dU(t), 
zt 40 


banal 
1 
uls + Af{1 — b(2x)}] 
2 


where u(s) is the L.S.T. (2.9). Formula (3.9) depends upon the unknown func- 
tion Ga(z ;s) which must now be determined. To do this we make use of the 
result of 

Lemma 3.1. Fors > Oand0 < z S 1 there exists a unique rool, 0 < x(8, z) < 1 
of the equation 


3.11) x = zuls + Afl — b(x)}). 





704 DONALD P. GAVER, JR. 


Proor: For fixed s > 0,0 < z Ss 1, i(2, z, 8s) = zuls + A{l — b(z)}J isa 


convex continuous function of xz. Furthermore, 
0 < 1,(0, 2,8) < 1,(1, 2,8) < 1. 


Therefore there is exactly one root in the interval (0, 1). 

Now if g:(z, 2; 8) is to generate (transforms of) probabilities, it must be 
bounded at least for allO < x S 1. Since Lemma 3.1 shows that the denominator 
of (3.9) has a zero in this interval, we determine the numerator so as to keep 
the expression hounded. Thus we have 


(3.12) 2Gin(2; 8)ps(8) = 2i(8, 2) 


and substituting this into (3.9) completely determines g;(z, x; s). For a similar 
argument see Bailey [1] and Bene& [2]. 

The expression (3.12) has an interesting probabilistic interpretation in its 
own right. Let rs.” be a random variable such that 


(3.13) gs = T.a— Te (i,m > 0) 


where N(7,) = i and N(T,,) = 0 for the first time thereafter. Putting this more 
informally, r,,’ is the length of a busy period that begins with exactly i cus- 
tomers present, the first just commencing service, and ends with the departure 
of the mth customer to receive service. Let M‘” = 1 be a random variable de- 
noting the number of customers discharged by the server during a busy period 
that begins with 7 customers present. If we let 


F(t; %) = Pr[rS? < t, M° = m|N(To) = i, 
then, because of independence, 
(3.14) F,.(t; 7) = Pi’ (t)*P_(t). 


If we now introduce the L.S.T. and G.F. we obtain from (3.14) 


f.(s;i) = | eo > 2" dFm(t;i) = 2Ga(z; 8) p(s) 
0 


m= 


(3.15) 


x1(8, 2) 
We have proved 

TueoreM 1. The G.F. of the L.S.T. of the joint distribution of busy period dura- 
tion and the corresponding number of departures in that duration is given by 


f.(83;%) = 2,(8, 2) 


where x,(8, z) is the root of (3.11) discussed in Lemma 3.1. 

In Theorem | it is assumed that the busy period begins with a single customer 
in the system. If it begins with i = 1 present, the appropriate transform is x;(s, z); 
if it begins with the entry of a bunch of customers the transform is b[x,(s, 2), 
b(x) being the G.F. of bunch size. 

An explicit solution of (3.12) will now be given as 





MARKOV CHAIN ANALYSIS 


THEOREM 2. For s > 0 and |z\| < 1 the root 2(s, z) can be written as 


« ~ —\t m—1 n 
(3.16a) a(s,z) = D> ” oti SS (At)" ont, dU™*(t); 
0 


m=1 ™ n=d n! 
consequently 
t —yt’ m—1 s\n 
(3.16b) F,,(t;1) = [ aie ae (at) bn, dU™*(t’) 
0 mm nmd TN: 


Proor: Since u[s + A{l — b(x)}] is an analytic function of z at least for 
|2| < 1 we can apply Lagrange’s Theorem for series reversion [13] to (3.11); 
making use of the fact that 


u"[s + A{1 — b(z)}] = é: exp [— of — M{1 — b(e)}] aU"*(t) 


and expanding around the origin in powers of z gives (3.16). Observe that 
F,,(t; 1) can be interpreted as the d.f. of the first-passage time (period) to zero 
from an arbitrary instant at which one customer is present, the latter just com- 
mencing service, and the number of customers receiving service in that time 
(period). 

Expansion (3.16a), together with (3.11), can be used to verify that the root 
2(8, 2) is the transform of a bivariate d.f. With the aid of this fact it can be 
shown by direct series expansion that the functions whose transforms are given 
by (3.9) and (3.12) are the transition probabilities of a Markov process. The 
uniqueness of the solutions to (3.4) is guaranteed by properties of the transforms. 
We state 

Tueorem 3. The G.F. of the L.S.T. of the transition probabilities P{})(t) is 
given by (3.9) in terms of the root x,(s, 2) of Lemma 3.1: 

Finally, we can obtain an expression for the joint d.f. of N(t), the number 
of customers present in the system at a fixed time ¢ measured from the beginning 
of a busy period, and M(t), the number of customers who have been serviced 
by that time. Let 


Pi}’(t) = Pr{N(t) = J, Ta St < Tau, 
N(v’) >O0(O0 St St) | N(T») ; To = 0}, 
= Pr(N(t) =), 
N(’) >O0(O St St)|N(To) = 9, To = 0] 


be that joint distribution. For the marginal d.f. of N(t) alone, with the condi- 
tion that N(t’!) > 0 (0 Ss t S t), we write 


P, ,(t) = Pr(N(t) =, Ni) >0(0st St)| N(To) =%, 
(3.18) 
To = 0}, 





706 DONALD P. GAVER, JR. 


then, by simple enumeration and independence, 
i 


(3.19) PIP (t) = Do PHP (oy * 1 — U(d)ja,_;-(t)] 


/ 


j=l 


Summation on n gives a corresponding expression for (3.18). 
Now introduce the Laplace transform of P{}?(t), 


ps}’(s) - | e PS" (t) dt 
0 


and the G.F. with respect to n and j, denoting the result by g,(z, x; 7). Then a 
few manipulations, using (3.19) and the properties of transforms of convolu- 
tions, show that 


i a 1 — uls + A — Ab(z)] 
(3.20) gi(z,2;8) = ale,30)4 Lawl + ate 


the transform of P,;(¢) is obtained by letting z tend to one in (3.20). 


4. Busy periods with instantaneous defections. The functional equation for 
the joint distribution (3.11) of busy period length and number of departures 
will now be derived directly, and certain ergodic properties of the process de- 
duced. Our method is basically the same as that used by Takacs [12] for deriving 
the marginal distributions of this joint distribution when arrivals are Poisson. 
We shall, however, generalize the arrival process slightly to allow for “‘instantane- 
ous defection’”’. By instantaneous defection we shall mean that when a customer 
arrives and finds that he cannot be served immediately he joins the queue with 
binomial probability p, independent of the state of the system, and departs im- 
mediately without waiting for service with probability g = 1 — p. Such an 
assumption is reasonable wher the arriving customer can only discover whether 
or not he must wait, and not how long. This state of affairs is not uncommon in 
actual congestion situations. We wish to find the joint distribution of a busy 
period length, the number of discharged customers, and the number of defecting 
customers in the busy period. 

Let A,(t) be the number of customers who arrive in a time interval of length 
t who wait, and A.(t) the number who arrive but immediately defect. Then by 
our assumption about the defection process 


Pr [Ay(t) = k,, Ao(t) = | A(t) = A) = (i) pg’. 


Since A(t) has distribution {a,(t)}, we have for the joint distribution of waiting 
and defecting arrivals 


(4.1) Ay, ko(t) = a(t) (‘) pq’, k, + ke = k. 
1 


The generating function of this joint distribution is, from (2.6), 





MARKOV CHAIN ANALYSIS 


(4.2) a(z,22;t) = >> >> atzstar,4,(t) = a(par + gee, t) 


kj=0 kg—0 


Let us define the random variables 7S , M{”, and M$” just as we did r{” 
i J 


and M“: 7{° is the length of a busy period that begins with 7 customers present 
and ends with the discharge of the m,st customer to receive service, M{” is the 
number of customers receiving service, and M;” the number who come to the 
system but immediately defect. 

Let 


(4.3) Fay.m(t) = Pri rg? st M{? =m, M3” = m,} 


We can now write down directly the equations to be satisfied by F'», .m,(t). 
Suppose that at some initial instant the system is in the state [V(7>) = 7, T> = 0}, 
i.e. 7 2 1 customers are present and one has just commenced service. Then 
in order to pass to the zero state after exactly m, + 1 customers depart, the last 
departure occurring at some time not later than ¢, the system must (a) pass from 
[N(T.) = i, To = 0] to [N(T;) = j, Ti: = t'), where t’ < ¢, and then (b) from 
[IN(T:) = j, Ti = t] to [N(Tn,) = 0, Tn, & t]. These events are independ- 
ent. Furthermore, the events of passing from [N(7T;,) = j, Ti = 0} to 
[N( Trim) = J — 1, Team S&S t] (J > 1) for the first time are independent, with 
the same probability, and do not depend upon j and k. Introduction of the de- 
fections does not materially alter the above observations. The following equa- 
tions result: 


t 
Fim (t) = [ o,m,(7) dU(r) 
0 


x mos 


2 t 
Fount) = 2 I Gis ny(7) AU (1) #F ei imy—ey(t) 


ky=0 ko=0 


Introducing the G.F. 


(4.5) > 2 212s Fm, melt) 


m j=l me=0 


convergent at least for |z,|, |ze} S 1, and the L.S.T. 


@ 


(4.6) Juy.29\! | e *" de F(a , 2 ;t), 

0 
we obtain, using (4.2) and (3.10) and the properties of generating functions, 
the functional equation 


(4.7) Sey.zg(8) = auls + A + ADl fe, .2,(8) + Ger}] 

For s > 0, \z| S 1, |z| S 1 equation (4.7) has a single root less than unity. 
This root is the transform of the joint d.f. (4.3). An explicit solution can be 
given to (4.7) using a Lagrange expansion, but we shall content ourselves with 
the information obtainable directly from (4.7). 





708 DONALD P. GAVER, JR. 


Assume that the service times have expectations 


(4.8a) n= [ t' dU(t), i= Lm ’ 
0 

and let o = m, — mj. Let 

(4.8b) 6; = E(B’), i= 1,2,:--, 


denote the expectations of the bunch-size. It is convenient to define p = dm , 
the traffic intensity parameter. 


Then 
Lemma 4.1 (a) © wld — rb(pe + g)] | 1 = Pe 
(b) fia(O) = 1, pe = 1 


(ce) fia(O) =€e< 7 Pp > 1 


Proor: Direct differentiation establishes (a). Using Abel’s theorem, f1,:(0) 
satisfies (4.7) with s = 0, z,; = z. = 1. Then, using (a) and the continuity and 
convexity of u[A — Ab( px + q)], (b) and (c) follow. 

THEOREM 4. When pp < 1, busy periods end in finite time with probability one. 
When pp > 1, busy periods last indefinitely long with probability 1 — «€, where « 
is the root (less than unity) of 


(4.9) x = ul — Ab( pr + Qq)), Ospesil. 


In many practical situations busy periods alternate with idle periods; from 
Section 2 the latter are independent, exponentially distributed random variables. 
We call this the “general” process, and discuss it further in Section 5. Several 
properties of such processes are apparent from Theorem 4. Since, when pp < 1, 
return to N = 0 from any state will always occur in finite time, this event is 
“persistent” in the language of recurrent events [6]. It follows from the theory 
of recurrent events that N = j will occur infinitely often in any realization of the 
general process. When p > | return to N = 0 is “transient’’, and the event that 
N = j,j finite, will occur only finitely many times during a process realization; 
eventually the number of customers in the system grows indefinitely. 

From (4.9) we can deduce other properties of “transient’’ systems, i.e. those 
for which pp > 1. Suppose two queueing systems are confronted with identical 
arrival patterns but have different service time distributions, U,(t) and U,(t). 
Suppose further that pp, = ppg > 1. Then, if the L.S.T.’s of U.(t) and U,(t) 
satisfy 


uslA — AbD(px + q)| > uslA — Ab( px + q)], 


for all x in the interval (0, 1) we observe immediately that e, > ¢€,. Hence, 
at least in some cases, the relative tendency for busy periods of systems to be 
indefinitely prolonged when traffic intensity exceeds unity can be deduced di- 





MARKOV CHAIN ANALYSIS 709 


rectly from the L.S.T.’s of the respective service time distributions. As an ex- 
ample, if U,(t) = 1 — é', and U,(t) is the distribution of “constant” service 
times, both having unit means, we see from their transforms that «, > ¢s, 
and can conclude that a saturated (pp > 1) “exponential” system is likely to 
have a greater number of finite busy periods before becoming permanently busy 
than is a saturated ‘‘constant’”’ system with the same mean service time and 
arrival process. 

Returning now to (4.7), we discuss the moments of busy period length, r, 
and of the number of discharges, M, during a busy period, when we suppose no 
defections occur (p = 1). Thus, first setting z, = 1 in (4.7) we have 


E{r™ tie wig nang p<, 
Ao, (1 — 
(4.10) i p) 

Var [r?] des o + pme( 2/5; ) 


(1 — p)? 


We observe that mean busy period length depends only upon the mean number of 
arrivals in a unit time interval and mean service time, while the variance of busy 
period length in general depends upon the second moment 4: of the distribution 
{b;} as well as mean and variances of arrivals and service times. 

The moments of the distribution of the number of customers discharged during 
a busy period are, similarly, 


EM?) = —|— 
i~@¢ 


(4.11) (ee 
» 7G) +# Ga) 
Var (M} ae ta 61 m; 


(1 — p)* 
Again, the mean number of discharges depends only on mean arrivals and the 
mean service time. The variance depends upon second moments, both of arrivals 
and of service times, in much the same way as did Var [r“”]. 
The expected number of defections (p < 1) during a busy period comes directly 
from (4.7) by differentiation. We have 


E[M$”] = (1 — pp pp <1 
1 — pp 


The corresponding expected number of discharges following service is 


EiM?) = —) pp <1 
1 — pp 
We can thus conclude that busy periods end in finite time and the expected 
number of defecting customers equals or exceeds the expected number actvally 
served if, and only if, 


(4.12) 





710 DONALD P. GAVER, JR. 


In order for this to be true, at least half of the arriving customers must defect, 
on the average. 


5. The general process. The formulation of Section 2 implies that transitions 
between states may occur with the intervention of one or more idle periods. 
We call the process permitting such transitions the general process, and in this 
section relate it to the busy period process discussed earlier. 

In a realization of the general process new busy periods begin at the sequence 
of random times (0 < t; < te < t; ---), where time is measured from an arbi- 
trary initial instant. We shall call t, the time of the beginning of the kth busy 
period. From (2.15) we have that 


t.¢ {B.| Ba > Trea} 


Definition. The interval (ti. , te) will be called a (the kth) renewal period. 
The random variable re(k) = ty — te. is the length of the kth renewal period. 
In the general process 


(5.1) tr(k) = r(k) + r1(k) ke=2 


where r(k) is the length of the busy period that immediately follows the arrival 
at th. , and r;(k) is the length of the idle period immediately following the 
latter busy period, and preceding the arrival at t, . We have, from Section 2, 


(5.2) Pr [r(k) S x] = 1— exp (—Az). 


Since the busy periods beginning at t.(k 2 1) each commence with the arrival 
of a bunch of customers, the latter having bunch size distribution {b,}, we have 
from Section 3, 


(5.3) F(t) = Pr{r(k) s t] = > b,F(t; 1), 
i=l] 
where 
(5.4) F(t;i) = Pr{r® < t) = > F,.(t; 2). 
m=1 


From Section 2 and the above {7;(k)}(k 2 2) is a sequence of independent 
random variables, each having the exponential d.f. (5.2), and {r(k)}(k 2 2) isa 
sequence of independent random variables with d.f. (5.3). It follows from (5.1) 
that the sequence {re(k)}(k 2 2) of renewal periods form a sequence of inde- 
pendent, identically distributed random variables with d_f. 


(5.5) R(t) = Pr[re(k) Ss t] = F(t)el(t). 


Such a sequence of random variables is called a renewal process, cf. Blackwell 
[3], Feller [5], Smith [9], [10]. We can therefore state 

THEOREM 5. The sequence of renewal periods |rpe(k)}(k 2 2) constitutes a re- 
newal process. 





MARKOV CHAIN ANALYSIS 711 

Note that because of the imposition of initial conditions, the d.f. of re(1) = ti , 
the time of the beginning of the first busy period, is 

R(t;i) = Pr[rk? st] = F(t;i)sI(t), i>o 

= I(t), +=0 


where we take the initial state to be [V(7)) = i, Ty = 0] wheni > 0 anda 
service is beginning at 7) = 0. From the above considerations the df. of ti , 
the time to the beginning of the kth busy period, is given by 


r,.(t; 2) = Pr [t, < t| N(To) = 1, To = 0} 


(5.6) 


= R(t; i)«R”*(t) (k 
and, when N(7,) = 0, by 
(5.8) r.(t;0) = 1(t)#R“?*(t) (k -i = 0), 


and we adopt the usual convention that R*(t) = Uo(t), the unit step at the 
origin. We shall call r,(t; 7) the renewal distribution. 

Now in order to obtain the probability that N(t) = j > 0 in the general 
process, given that at ¢ = O[N(7,) = i, To = O|(¢ > O) i.e. initially there are 
7 customers in the system, and one is just commencing service, then either 
(a) N(t) = j7 > Oand N(t’) > 0(0 S @ S 2), ie. that at time ¢ the number 
of customers present is j and the first busy period has not terminated, or 
(b) N(t) = 7 > Oand N(t’) = Oat least once in (0 S ¢ S t); i.e. that at time 
t the number of customers present is j and at least one busy period has elapsed. 
The probability of the event (a) is P;;(t), as given by (3.18). The probability 
of the event (b) is easily seen to be 
(5.9) P(t) *r,(t; 7) (1,j 21), 


where 


oa 
(5.10) P(t) = >. bP,,(t). 
i=l 
Summing the probabilities of the mutually exclusive events (a) and (b) we 
obtain 
THEOREM 6. Let 


(5.11) Q;;(t) = Pr [N(t) = j| N(T») = 4, To = 0] 


be the distribution of N(t) in the general process described. Then Q;;(t) is expres- 
sible in terms of P;;(t), the d.f. of N(t) for the busy period process, and the renewal 
distribution r;,(t; 7): when t > O 


(5.12) Qi(t) = P(t) + dS Pi(t)er(t; 7) (i,j 21), 
k=2 





712 DONALD P. GAVER, JR. 


and 

(5.13) Qio(t) = F(t; t)s{1 — Z(t)] + > Fiat — I(t)l#r,(t; ¢) 
when i = 0 

(5.14) Qoj(t) = ¥ Pi(t)aru(t; 0) 

and “ 

(5.15) Qo(t) = [1 — Z(t)] + > Fel — I(t))#r,(t; 0). 


Lastly we sketch the derivation of the joint distribution of the random vari- 
ables V(t) and M(t) in the general process. Observe that the joint d.f. of re(k), 
the kth (k = 2) renewal period, and M(k), the number of customers receiving 
service in rre(k), is given by 


(5.16) R(t) = Pr[re(k) St, M(k) = n| = >> dF, (t; i) s(t). 
t=] 
It follows from independence considerations that the joint d.f. of t, , the time 


to the beginning of the kth busy period, and M(t,), the number of service com- 
pletions in that time, is given by 


(5.17) ri” (t; i) = R,(t; i) aeRO *(t) (k = 2;i>0), 
and, when N(7>) = 0, by 
(5.18) ri” (t;0) = I(t)+RO&?*(0), (k = 1;%4 = 0) 


The convolution operation is to be understood as applying to both n and ¢. 
Next let 


(5.19) Q{?(t) = Pr[N(t) = Jj, M(t) = n|N(To) = 1, Ty = 0] 


be the joint distribution of N(t) and M(t). Then, by an argument analogous 
to that giving Theorem 6, we obtain 


(5.20) Qf(t) = PSP (t) + D> Ph” (t) arf” (t; 7) (i > 0), 
k=2 


where again convolution applies to both n and ¢. The marginal distribution of 
M(t) alone is obtained by summing on j and adding Q{0’(t). We observe that it 
is only necessary to omit the summation on k in (5.20) to obtain the joint 
distribution of N(t), M(t), and K(t), where the latter is a random variable 


denoting the number of busy periods that have terminated in time ¢. 


6. Ergodic properties of the general process. We shall now investigate the 
ergodic properties of the random variable N(t) in the general process with the aid 
of a result in renewal theory. 





MARKOV CHAIN ANALYSIS 


Definition: The expression 


(6.1) P(t4) = SS nled 
k= dt 
will be called the renewal density. 

In words, r’(t; «) dt + o(dt) is the probability that a busy period begins in 
the time interval (t, t + dt). We observe that r’(t; 7) exists for all ¢ since R(t) 
is convolution of F(t) with I(t), the latter being absolutely continuous. 

Under broad conditions the renewal density converges to a constant ast— ©. 
We make use of a result of W. L. Smith [9], [10]. For similar results cf. Feller 
[5], and Blackwell [3]. 

THEOREM 7 (W. L. Smith): Jf 

(i) The renewal periods {re(k)} are non-negative and E[re(k)| S ~, 

(ii) dR(t) 

dt 
(iii) RNS) tends to zero as t tends to infinity, 


dt 
then 


€ Liss for some 6 > 0, 


. 1 
(6.2) lim r’(t;4) = =—. 
tooo E[re] 
Referring to the definition of renewal periods, the expression (5.5), and Theorem 


4, it is easy to verify that the conditions of Theorem 7 are satisfied. From (5.1), 
(5.2), (5.3), (4.10), and Theorem 4, 


1 
Bird = i, 
(6.3) inal A(1 — p) 


@, 


We have, then, 
THEOREM 8: The renewal density r’(t) tends to a constant as t tends to infinity: 


lim r’(t;7) = A(1 — p), p<l 


t+o2 


(6.4) = 0, p 1. 


Now from Theorem 6, Theorem 8, (6.1), and a simple lemma (cf. Smith [10], 
p. 14) we have 

THEOREM 9: The distribution of N(t) in the general process tends to a limit 
independent of the initial conditions as t tends to infinity: 


lim Qio(t) 1 — p, 
t+ 


(6.5) 
0, 





714 DONALD P. GAVER, JR. 


(66) lim Qu{t) = ied 


0, p 


q; = lim Qii(¢). 
When p < 1 the generating function of {q,} is obtained from (6.5), (6.6), the 
developments of Section 3, and (6.3). Letting s tend to zero in (3.20), referring 
to (3.9), and recalling that 2,(0, 1) = 1(p S 1), there results the following ex- 
pression for g(x), the generating function of {q;}: 


f P,(t) dt, “. & * 


IV 
— 


We put 





a — 3 ry, — Aa od — z)uldfl — b(x)}] 
(6.7) a(t) = 2) 2'g = uld{i — b(@)}] — 2 


The classical Pollaczek-Khintchine-Kendall formula for pure Poisson single 
arrivals, cf. Kendall [7], [8], comes from (6.7) by setting b(2) = x. Formula 
(6.7) can also be derived using the matrix methods of Kendall. From (6.7) it 
can be verified that {q;} forms a bona-fide probability distribution when p < 1; 
we shall term this distribution the long-run distribution of N(t). 

Moments of the distribution {q,;} are available by differentiating (6.7). Thus 
for example 


» be ‘ 
(68) E(N) = > in = “e+e0— 5) | (2 - 1) +0(1 +5)]. 
‘ 1 


For a fixed value of p the effect of a departure from pure Poisson arrivals is to 
increase the average number of customers waiting. This increase is more pro- 
nounced for p close to unity than for p small. 

Expression (6.7) can be expanded to yield the probabilities q; explicitly. A 
useful approximation to these probabilities can frequently be obtained by 
making use of 

Lemma 6.1: Suppose p < 1. If, for complex x, b(x) and u[A} 1 — b(x)}] converge 
or 1 < |x| < L, L real and greater than unity, then 


(6.9) x — ufAf{l — b(z)}] = 


has two real roots: x; = 1 and x. > 1. The magnitudes of x; and x2 are smaller 
than those of any other roots of (6.9). Note that the assumption that 
ufA{ 1 — b(x)}] converges for 1 < |x| < L impliesthat 1 — U(t) = o(e")(t— ~), 
where c is real and negative, cf. Widder [14]. This assumption is seldom restrictive 
in practice. 

Proor: For « < L, ufA{l — 6(x)}] is a continuous convex function of 2. 
Clearly x; = 1 satisfies (6.9), and 


ad uld{1 — b(x)}] leu = p < 1. 
dx 





MARKOV CHAIN ANALYSIS 715 


Thus there exists exactly one more real root 22 > 1. On the circle C;: 
\z| = % — € > 1, ulA{1l — b(x)}] < |z|, so by Rouché’s theorem there is exactly 
one root of (6.9) inside C; . This must be xz, = 1. Since ¢ can be made arbitrarily 
small, there are no complex roots of smaller magnitude than x. , and there is 
only the root x2 on |z| = 22. Both 2 and x2 are simple. 
Now from (6.7) and familiar properties of generating functions, 
~* n C ( r) 
(6.10) 2" gat 
n=0 j=0 l1-—2z 
and 
- 1 d" q(x) | 
(6.11 Gj 0 ee es bs 
2, 4 n! dz" 1 — X\em0 
Applying the Cauchy Integral Formula, we have 


i q(w) 
ai qi = [ ( dw 


j=0 2x1 1 — w)w"™! 


where C is a circle in the 2z-plane, centered at the origin and with radius less 
than unity. Enlarge the contour C to C2, a circle with center the origin and 
radius 22 + 6, where 6 > 0 is chosen so that z2 < 2% + 6 < |zx,|, 2; being the 
third root of (6.9), if it exists, in order of increasing magnitude. This circle 
surrounds the simple poles of the integrand at 2; = 1 and 22 , so 


(6.12) ae 1 / q(w) dw 


Qmi Jo, (1 — w)w™* 


“Ty 


where re is the residue of q(x) / ((1 — x)a , evaluated at x = x: 


mt S J a 
(6.13) rz = (1 — p) "la — b(xe)}]Ab’(ar2) + AS xe’ 


since g(x) is bounded on C: , we have finally 


“ 1 
2, Gq = 1 — Te + 0 (1-53) . 


We state this result as 
THEOREM 10. Jf b(x), and u[A{1 — b(x)}] converge for \x| < L, where L > 1, 
then 


' = f —1 1 
(1s) a~ 0 - on Kaper 3 
where x2 is the second real root of (6.9) in order of increasing magnitude. 

In other words, the long-run probability distribution of long waiting lines is 
asymptotic to the geometric. From (6.14) it is apparent that in order to reduce 
the probability of long lines, x2 should be increased, if this is possible in practice. 
Because of the convexity ef ufA{1 — b(x)}], 22 is increased if ufA{1 — b(x)}] is 





716 DONALD P. GAVER, JR. 


decreased for each x = 1. On the basis of these observations we can state the 
following simple result : 

THEOREM 11. A and B are two waiting-line systems. A is characterized by an 
arrival process with parameter d,4 , generating function b4(x), and service time 
d.f. U,(t); B, by Xs, ba(x), and U,(t). Then if, for all x = 1, 


ualAall a ba(x)}] < UslAz{ 1 - bs(x)}], 


the long-run probability of long waiting lines, as given asymptotically by (6.14), 
is smaller for system A than for B. 

Although the criterion given by Theorem 11 is crude it allows some interesting 
comparisons to be made. For example, if two systems have identical distributions 
of arrivals in any time interval (A, = Ag ; b4(a) = bg(x)), and the same mean 
service time (e.g. of unit length), but U,(t) = 1 — é“, while U,(t) is a de- 
generate d.f. concentrating at unity, the probability of lines exceeding n in 
length is greater for system A than B, asymptotically as n tends to infinity. This 
result is not surprising when we compare the corresponding means and other 
moments of long-run line length. Similarly, if the kth member of a (hypothetical ) 
sequence of systems has service time d.f. with LST [u(s/k)]*(k = 1, 2, 3, ---), 
and each member of the sequence has identically distributed arrivals during 
a time interval, then the probability of lines longer than n decreases as k increases, 
asymptotically as n approaches infinity. These results can be compared to those 
of Smith [11]. 


7. Waiting times. Suppose a customer arrives at the system (line plus server) 
at time ¢ 2 0. ‘“First-come, first served” dictates the order of service. Then 
in order to reach the server he must wait a time equal to the unelapsed service 
time of the customer currently being served, plus the service times of those 
customers ahead of him in line. 

Let X(t) be the unelapsed service time of the customer occupying the server 
at time ¢. Then, given that the last previous departure from the system occurred 
att — 7, 


(7.1) Pv 20) Sal) & Sy 


Referring to Section 3, in particular to the developments leading to (3.17)- 
(3.19), we can derive an expression for the joint d.f. of N(t) and X(t) at time ¢ 
after the beginning of a busy period. The result is easily seen to be 

P;;(a, t) = Pr[(0 < X(t) S a, N(t) = j, 
(7.2) Nit’) > OO Sst St)|N(To) =i, T, = 0] 
(.4 
I 


Dd Pi(t)[{U(t + a) — Ult)jay--(t)], 


j=l 


II 


We again find the introduction of transforms useful. It can be verified that 


’ 


T er t ie > 8 = (Ct) 
(7.3) | I ete a iU(t +a) — Uj at = BOS 
“0 0 


_ = 2 





MARKOV CHAIN ANALYSIS 717 


assumed to converge at least for ¢, s 2 0. Since the first moment of U(t) is 
assumed finite, the limit of (7.3) exists when ¢ — s tends to zero. Let 


(7.4) p,;(¢, 8) = [ I eet gd. Pi(, t) dt 
0 0 


and 


(7.5) g(z;f,8) = 2d x’pi;(f, 8) 


Transforming throughout (7.2), we obtain, using (2.2), (7.3), and the familiar 
properties of L.S.T.’s and G.F.’s of convolutions, 

te uls + A{1 — b(x)}] — ule) 

6) be, oY = gals: 9) eng eee 
ct B(x; 3 gi( ¢—s—A{l — b(x)} 
where g;(z; s) is given by (3.9), (3.10,) and (3.12) after setting z = 1. 

The total unelapsed service time at t, W(t), is X(t) plus the sum of the service 
times of the V(t) — 1 customers in line: 


P,(a,t) = Pr[0 < W(t) Sa, 


N(’) > 0(0 st S$ t)|N(TM%) = i, Tr = 0) 


> [ UP? *(a — y) dy Pi;(y, t). 
0 


j=l 
An expression for the transform 


(7.8) pi(f, s) rs ee d, Pi(a, t) dt 
Jo Jo 


comes directly from (7.6). After transformation with respect to a (L.S.T.) and 
t(L. T.), the right hand side of (7.7) is seen to be the generating function 
g.(x;¢, s) with x replaced by u(¢), the whole then divided by u(¢). After a little 
simplification we have 
(7.9) p(t, 8) = lu = tne; DT 

s— + A{l — biu(s)j] 

The transform of the more involved joint d.f. of /(t), the number of customers 
who have received service by time t; N(t), the number present in the system 
at t; and W(t), for the busy period process is easly obtained. It is only necessary 
to replace g;(x; s) by g:(z, 2; s) in (7.6) to include M(t), and to replace x by 
xu(¢) in the same expression, afterwards dividing by u(¢), to account for N(t). 

In practice interest frequently centers around W(t) for the general process 
described in Section 6. Let 


(7.10) Q:(a,t) = Pr(O< W(t) Sa@|N(TM%) =i, TT. = O}. 


The enumerative argument leading to Theorem 6 can be used to show 





718 DONALD P. GAVER, JR. 


THEOREM 12. The d.f. of W(t) for a general process is expressible in terms of the 
d.f. of W(t) for the busy period process and the renewal distribution: 


(7.11) Qi(a, t) = Pi(a, t) + > Pla, t)er(t;i), a> QO, 


k=2 


where 


P(a,t) = > b.P i(a, t). 


t=1 
Since W(t) = 0 if and only if N(t) = 0, we have Q;(0, t) = Qyo(t); see (5.15). 
The ergodic properties of W(t) for the general process follow from those of 
N(t). The limiting d.f. of W(t) comes from (7.11), and we have 


(7.12) Oe ee te geo £ wre 1) ah, e<i, 


= (0, = 1. 


The L.S.T., q(¢), of the limiting d.f. g(a) comes from (7.9) by letting s tend to 
zero, and dividing by E(rez). Justification follows almost exactly that for (6.7). 
We obtain, after adding the long-run probability that W(t) = 0, 


1 — biu(s)} | 
= q( = —_ joneioarineeieettatete ie lgermes 
eg MOM Ferd Su 
z ie oe eC(s) | 
— ol T= ¢c@’ 
where 
_ 1 [1 — blu(s)} 
(34) OD) = oF [2] 


is the L.S.T. of an (absolutely continuous) d.f. The expression (7.13) may be 
written as 


(7.15) w(t) = (1 — p) 2d, e"C"(s) 


which shows that g(a) has a single jump at the origin, equal to (1 — p), and 
is absolutely continuous elsewhere, cf. Bene’ [2]. Notice that if all departures 
from pure Poisson arrivals are due to bunches arriving together, 


(7.16) ns (1 — p) (es 2a one 


raft —— (¢) Phe 1 — pC(g) 





which is essentially the Pollaczek-Khintchine formula, ef. Kendall [2, 3], with 
“customers” now made up of the bunches of individuals arriving simultaneously. 

The moments of the d.f. g(a) come from (7.16) by differentiation with respect 
to ¢, the derivatives evaluated at ¢ = 0. For example, 





MARKOV CHAIN ANALYSIS 


2 
(7.17) EW) = #0 = 57% — (2 + A - 


~ 20 = p) \i * m3 


and 


(7.18) Var [W] = oa [oCi + (1 — p)C%] 
where {C;} is the sequence of moments about the origin of the d.f. whose trans- 
form is C(¢). 

An approximation to 1 — {qo + q(a)} = Pr[{W > al, valid asymptotically as 
a increases, can be obtained by methods similar to those of Lemma 6.1 and 
Theorem 10. We state, 

LemMA 7.1: Suppose p < 1. If, for complex ¢, bju({¢)} and u({) converge for 
—L < Re(¢) < «~ (L > 0), then the denominator of (7.13), D(t) = ¢ — 
A{L — b{u(¢)}], has two real zeros, f; = 0 and f < 0, such that if ¢; is any 
other real zero of D(¢), Re (fs) < 2. The proof is omitted. Note that our 
assumptions imply a restriction on U(t); see Lemma 6.1. 

We now apply the complex inversion formula for Laplace transforms, cf. 
Widder [14], 


1 C+ 100 


(7.19) q(0) + gla) = =— ef w(f) dt, -L<c<0 
“71 4 c—ix c 
Consider the rectangle in the ¢-plane with corners at c + 77 and 


a — b+ iT(c < ag < az — 6 <0). 


Integrate w(¢) / ¢ around this contour and let T tend to infinity. From Cauchy’s 
theorem, 


ag—b+ie (re 
(7.20) Go + qla) = a + age™* + : } e™* wy) dt 


2nt ag—b—1% 


where a; and a, are the residues of (w(t) / ¢ at the poles f; = 0 and & < 0. We 


have a, = 1, and 


1 
a, = (1 — p) 1 + du’ (g2)b’fulfe)} 


Since w(¢) is bounded on the line of integration in (7.19), we have 
~ 9 (e-8 
(7.21) go + g(a) = a, + age™* + O(e?”*), 


thus we have 
THEOREM 12. The probability of waiting times exceeding a, when the d.f. (7.12) 
can be assumed to apply, is asymptotically exponential 


fea 


1 — {qo + g(a)} ~ ae 


Compare Theorem 10 and the results of Smith [11]. 





720 DONALD P. GAVER, JR. 


8. Acknowledgements. The writer is indebted to Professor William Feller, 
Dr. Robert Hooke, and the referee for comments leading to clarification of the 
original presentation. 


REFERENCES 
{l] N. T. J. Baruey, “‘A continuous time treatment of a simple queue using generating 
functions’’, J. Royal Stat. Soc., Series B, Vol. 15, (1954), pp. 288-291. 
(2) V. Benes, “On Queues with Poisson Arrivals’’, Ann. Math. Stat., Vol. 28 (1957), pp. 
670-677. 
[3] D. BuackweE.L, ‘‘A renewal theorem’’, Duke Math. J., Vol. 15 (1948), pp. 145-150. 
[4] J. L. Doos, Stochastic Processes, John Wiley and Sons,Inc., New York, 1953. 
[5] W. Feuer, ‘On the integral equation of renewal theory,’’ Ann. Math. Stat., Vol. 12 
(1941), pp. 243-267. 
[6] W. Fevier, An Introduction to Probability Theory and Its Applications (Second 
Edition), John Wiley and Sons, Inc., New York, 1957. 
[7] D. G. KENDALL, ‘‘Stochastic Processes Occurring in the Theory of Queues and Their 
Analysis by the Method of the Imbedded Markov Chain’’, Ann. Math. Stat., 
Vol. 24 (1953), pp. 338-354. 
|8] D. G. Kenpatu, ‘‘Some problems in the theory of queues’’, J. Royal Stat. Soc. Series B, 
Vol. XIII (1951); (with references and discussion), pp. 151-185. 
{9} W. L. Smirn, “Extensions of a renewal theorem’’, Proc. Cambridge Phil. Soc., Vol. 51 
(1955), pp. 629-638. 
[10] W. L. Smiru, ‘Asymptotic Renewal Theorems’’, Proc. Royal Soc. Edinburgh, Sec. A 
Vol. LXIV (1954), pp. 9-48. 
[11] W. L. Smrrn, “On the distribution of queueing times’’, Proc. Cambridge Phil. Soc., 
Vol. 49 (1953), pp. 449-461. 
{12] L. Taxes, ‘Investigation of waiting time problems by reduction to Markov 
processes’’, Acta Math. (Budapest) Vol. 6 (1955), pp. 101-129. 
(13) E. T. WairraKker anp G. N. Watson, Modern Analysis, Cambridge, 1946. 
[14] D. V. Wipper, The Laplace Transform, Princeton, 1941. 


’ 





ASYMPTOTIC EXPANSIONS IN GLOBAL CENTRAL LIMIT 
THEOREMS 


By Ratpn PALMER AGNEW! 
Cornell University 


1. Introduction. Let £, , &2,--- be independent random variables having the 
same d.f. (distribution function) F(x). We suppose that 


(1.1) | zdF(x) = 0, | a dF(x) = 1 


so that F(x) has mean O and standard deviation 1. Let F(x) denote the df. 
of the normalized sum 


(1.2) (fh + & t+ --- + &)/n'. 


A special case of the central limit theorem then asserts that, for each individual 
x in the interval —»~ < 2 < @, 
(1.21) lim F,(2) = ®(2) 
n?o 
where (x) is the Gaussian d.f. defined by 


a 
(1.22) ®(x) = (Qn) [ et? du. 
a 
It is our purpose to study the behavior as n — ~ of the constants C,, defined 
by 


3) Ca [ |Fa(x) — (x) de. 


For each p > 0, let constants C,(p) be defined by 


we 


(131) C,(p) = / |Fa(x) — &(2) |? de 


when these integrals exist, that is, are finite. It is known from [1] and [2] that 
the hypotheses (1.1) imply that if p > 4 then the constants C,(p) exist and 
lim,+<C,(p) = 0. Beyond this, not very much is known about the constants 
C,,(p). The moments a, and the absolute moments 6; of F(x) are defined by 


«© eo 
(1.4) . = | a dF (2x), & = [ la |* dF (x) 
Received March 11, 1957. 


1 The research of this author was supported in part by the United States Air Force under 
Contract No. AF18(600)-685 monitored by the Office of Scientific Research. 


721 





722 RALPH PALMER AGNEW 
when these integrals exist. If 8; exists, then an inequality of Esseen ((5], p. 78) 
shows that there is a constant K(8;), depending only upon @;, such that 


K (8s) log (2 + ||) 


(1.5) \F.(2) — (2) | ~ isle 


IIA 


This implies that if 8; exists, then the constants C,(p) exist when p > } and 
C,(p) = O(n-”"*). In particular, if 8; exists then C,, = O(n). There is a sense 
in which this result cannot be improved because it is shown in [2] that if F(x) 
is the symmetric binomial d.f. satisfying (1.1), then 


' iy 1 
(1.6) c= 1h +0(). 


The only other case in which the constants C, have been appraised is that for 
which F(x) is the df. of a random variable £ uniformly distributed over 
—a S x S a; in this case (1.1) implies that a = 3' and it is shown in [2] that 


wa 1 
61) Com STi ORE. 
— n? 12800) * 6) 


One of our main purposes is to give conditions under which there exist con- 
stants D, , D., D;, --- such that the expansion 


D D D, 1 
n n* n n 
is valid for each k = 1, 2, 3, --- and to give explicit expressions for D, and 


D,. Such results are given at the ends of sections 4 and 6. Binomial distribu- 
tions are treated in section 7, and the symmetric binomial d.f. is treated more 
extensively in section 9. 


2. Formulas for the constants C,. Information about the constants C,, is 
obtainable by use of the c.f. (characteristic function) ¢(t) of F(x) which is 
defined by 


(2.01) g(t) = | e dF (x). 
It is shown in [2] that 
(2.02) [ [Fa(z) — (2) Pdr =4++/ |fomyr— ere. 
x n? wr Jo e 
Hence 
1 1 i’ n —nt? /2 2 dt 
2 , ase -— ~~ © . 
(2.1) Co= a=), llo@)— el 


The hypotheses (1.1) imply that, at least when k = 0, 1, 2, 


(2.2) ¢”(t) = [ (ix)*e'* dF (x) 


3) 





GLOBAL CENTRAL LIMIT THEOREMS 723 


and hence that ¢(0) = 1, ¢/(0) = 0, and ¢”(0) = —1. This implies that we 
can choose a positive constant 7’ such that 


(2.21) \o(t)| <1 (O<tsT). 
Let constants 6,, 62, --- be defined by 

(2.22) 5, = log n)/n' (n = 1, 2,3, ---). 
For values of n so great that 0 < 6, < 7, we split C, into the sum of four terms 
by putting 

(2.3) 0 te co 4, c® i c® 4 co’ 

where 

i 


‘ va) 
(2.4) Cc: = = 


: I (ene? — ee — [o(t)]" le —nt?/2 o 


“ [o(t)]” — corr dt 


2’ 


“let aS: 


Estimation of C‘? and C? offers no aa as we shall see, 
2.81) c® = o(n™), C® = o(n™) 


where o(n~“) denotes a quantity which is o(n™“) for each fixed positive constant 
k. Since oni = 1 and | ¢(t) | s 1, it follows from (2.4) that 


ice! < ] 3 
iv = al x 
n — 


"3 ent) dt 
(2.82) 
isl 
ni wr 5°, 
To estimate C%’, let y(t) = | o(t) |*. Then ¥(0) = 1, ¥/(0) = 0, and y’(0) = 


—2. This and the fact that y(t) < 1 when 0 < ¢ S T imply that, for each 
sufficiently great n, 


ont) ar id - ane — O(n’). 


(2.83) max |¥(t)| <1 — 4,/2. 
6,5¢sT 


Hence, when n is sufficiently great, 


eT 
jc? | <- tf (1 — 8/2)%x" at 
tar Jz, 


 ¢ 5, \" n'T (log n)* |" 
7 4 a 1 “) 
. n} 36? (1 ) n(log n)? [1 - 2n |- o(n*) 


This proves (2.81). 


(2.84) 





724 RALPH PALMER AGNEW 

The problem of estimating C, is therefore reduced to the problem of esti- 
mating CS? and C‘?. Instead of (2.3), we henceforth use the formula 
(2.9) C.. aoe o(n™*) + qc + c®. 


3. The Constants (‘?’ . In this section we appraise the constants C“’ in terms 
of the Thiele [7] semi-invariants y, of F(z). We suppose that, for some integer 
m for which m 2 3, the moments a,, and £,, exist. Then, as ¢t — 0, 


e m (it)* i 
(3.1) o(t) =1—~+)> a + o(t”) 

2 ee El 
and using the ordinary expansion of log (1 + 2x) in powers of x gives 

2 m *,\k 

) m 

(3.2) log ¢(t) = — : + Zz cay v% + o(t") 

2 tes k! 
where 

5 Ys = wm — 4, Ys = a5 — 10a;, 

(3.21) 


vs = as — 15a, — 1003 + 30,---. 


The constants y;, ys, °°: are the Thiele semi-invariants of F(x) which are 
treated in the books of Cramer [3], [4] and Gnedenko and Kolmogoroff [6] and 
which have simplified forms here because ag = 1, a, = 0, and a = 1. From 
(3.2) we obtain, for each fixed n, 
nt?/2 w 

P 


(3.3) [o(t)|" = «€ 
where w is the function of n and ¢ defined by 
m ( t)* ne 
(3.31) w= n |= “ ve + o(t |. 
k=3 vs 


From (2.6) and (3.3) we find that 


i 1 bn 
(3.4) CS *— 


1 —nt? ) w ;2 dt 
; jar apr. 
ni T 0 : t? 
Supposing henceforth that 0 < t < 6,, we see from (2.22) that 
0 < nt < (log n)*/n' 


and hence from (3.31) that w = o(1) as n — o. Therefore we can use the 
formula 


m k 
(3.41) e—-1=>— + 0(w™) 
1 . 


k= 


and (3.31) to obtain a formula giving e” — 1 as the sum of a finite number 
of terms involving n, t, and y3, ys, °°: , Ym. When this finite sum is written 
down, it is found that 





GLOBAL CENTRAL LIMIT THEOREMS 


2 
(3.42) E le" — | =|jutwPev’t+e 


where 


(3.43) 


3 
(3.44) > nt pi ES + Fe a +.---. 
144 + 1296 + 
In (3.43), (3.44), and formulas which follow, the final dots represent finite 
sums of terms which turn out to give contributions to C“’ which are of lower 
orders of magnitude than the contributions of the terms which precede the dots. 
From (3.42), (3.42), and (3.44) we obtain 


5 E — 8 5 » 
v4 Y3¥6 n't’ 
2880 
¥3 4 3 
+ —nt 
864 

From (3.4) and (3.45) we see that C“? isa linear combination, with coefficients 
depending upon 73, ys, °°: , Of integrals of the form 


1 s* 


—nk?  p 2 
Ji e~ n't‘ di 


n' Jo 


where p and q are positive integers. Putting ¢ = n *u in (3.5) and using (2.22) 
gives 


nite 


1 logn ah i 
(3.51) J, = [ eu" du. 
0 


But, when n is sufficiently great, 


ao a 


—u2 % 2q—-1 —u2/2,\ —u2/2 
| e-u'du= | (t > u du 
logn 


“logn 


i] 
—u2/2 (log n)2/2 
“ [ e" “udu=e =’ 


“logn 


Hence 


i | Mi Lyptig 
(3.53) Jn =o(n ~) + ew’ du. 
0 


nita-P 


Using a standard formula for the integral in (3.53) gives 


pe 
(3.54) vo — o(n “) + T (2q)! 


pit+a—p q!2?¢+! i 


Use of (3.4), (3.45), (3.5), and (3.54) gives 





726 RALPH PALMER AGNEW 


on s A Ay A; 1 

(3.6) Cy = +S4+---+5+0 ) 
n n? nk n*+! 
where 
1 ¥ 

(3.61) Ay = =e, 

96 

ma ~ wiL3072 384 9216 36864 |’ 
and each of the constants A; , Az, A;, --- depends only upon a finite number 
of the semi-invariants 7; , ys, °°: . In ease the given d.f. F(x) has finite mo- 
ments of all positive integer orders, the integer m of this section can be chosen 
as great as we wish and (3.6) is then valid for each k = 1, 2, 3, --- . The ex- 
pressions for A; and A, given in (3.61) and (3.62) are particulary simple in 
the important case in which F(x) is symmetric because in this case y; = 0. 


The complexity of the expression for A, increases very rapidly as k increases. 
4. The constants (’;,"’; case lim sup | ¢(¢) | < i. In this section, we suppose 

that F(x) is a df. having a ec.f. ¢(t) for which 

(4.1) lim sup | ¢(t) | < 1 


t+~x 
and show that in this case 
(4.2) Cc = o(n”) 


where, as above, o(n “) denotes a quantity which is o(n™“) for each positive 
constant k. 

It is known ([3], p. 26) that if a c.f. ¢(¢) satisfies the hypothesis (4.1), then 
| o(t) | < 1 whent > O. Since each c.f. is everywhere continuous, the hypothesis 
(4.1) therefore implies that if 7 > 0, then there is a constant 6 such 
that 0 < 6 < land |¢(t) | < 6 whent = T. The definition (2.7) of C\ there- 
fore implies that 


2 


(4.3) Cc « / G"'t? dt = Te" 


/T 


and the desired conclusion (4.2) follows. 
Thus in case F(x) has a nonvanishing absolutely continuous component 
(3, page 17 and page 25] and in other cases where (4.1) holds, we have 


(4.4) C, = o(n’) + Cc 


and the results of section 3 suffice for the estimation of C, . In particular, if 
(4.1) holds and F(x) has finite moments of all positive integer orders, then 
(1.7) is valid when the constants D,, D.,--- are the constants A;, As, --: 
in (3.6). 





GLOBAL CENTRAL LIMIT THEOREMS 727 


5. The uniform distribution. Let F(x) be the d.f. of a random variable & uni- 
formly distributed over —a S x S aso that F(x) = 0 when x S a, F(x) = 
(x + a)/2a when —a S x S a, and F(x) = 1 whenz 2 a. This df. has mean 
0, and we assume that a = 3' so that the standard deviation is 1. In this case 
(4.4) holds. The moments a , a2, --+ defined by (1.4) are 


(5.1) un = [ (a*/2a) dx 


so that a, = 0 when k is odd and a, = a‘/(k + 1) when k is even. Using (3.21) 
gives y; = Oand y, = —6/5. Using (4.4) and (3.6) then gives the result (1.61). 


6. The constants (;,"’ ; case | ¢(t) | periodic. It is well known that if there 
is a positive value of ¢ for which | ¢(t) | = 1, then F(a) must be a lattice dis- 
tribution and | ¢(t) | must be periodic; and, conversely, if F(x) is a lattice 
distribution, then | ¢(¢) | must be periodic. Throughout this section, we suppose 
that F(x) is a df. for which (1.1) holds and | ¢(t) | is periodic. Then | ¢(t) 
has a least positive period which we call 27’. 

To estimate CS we start with a method employed in [2] for the case in which 
F(x) is the symmetric binomial d.f. From (2.7) we obtain 

2 2kT+T 
(6.1) CM oe St [ | @(t) |" a 


nw fal Joer—r e 
Since | ¢(t) | has period 27, this implies that 


(6.2) Cr a . 


i 2 
| |o(t) S(t) dt 
TIT 


1 
n? 


where 


x 


S(t) = Do(2kT + t)* 


k=1 
Sense 16) | = 1G dn Sa ore 


11" 


(6.21) CY = — 
n) 0 


| p(t) /P"Sy(t) dt 

where S,(t) = S(t) + S(—t) and hence 

(6.22) Si(t) = DO[(QkT + t)? + (2kT — t)”). 
k=l 


The function S,(t) is, as a function of a complex variable ¢, analytic except 
for simple poles at the points +27, +47, +67, --- and, as we shall see, it is 
an elementary function. From (6.22) we obtain 

’ d “ " —1 . | d t - l 
Si(t) = (27 — ¢)* = (ers oy 2 S ig? 
at | dt 27? f= k*? — (t/2T)? 











728 RALPH PALMER AGNEW 


Using the standard formula 


~ l 1 — xz cot rz 
(6.31) 2d k? — z = ~~ 938 a 


which is valid when 2 is not an integer, gives 


Sita . aha wat 
(6.32) S,(t) = dt I — OT cot =|. 


From this we obtain 


yf) a 2 ~2 

(6.33) Si(t) = (2/2T)'[ese” x — x} 

where x = 2t/2T. Differentiating the ordinary power series expansion of 
1 e ° ° ° ‘ ° 

(cot x — x ) gives a representation of the right side as a power series 1n 2. 


Putting x = zt/2T in this power series gives 


. ‘ r\f1 ] nt \? 2 at \' ] xt \° 
d Ss = | — i> Th enaras iene aie lode 
oe? te a) } Ti (=) T ig9 (#*) + 675 (3*) + |. 


The numerical coefficients in (6.34) have simple expressions in terms of Bernoulli 
numbers, and the expansion is valid when |¢| < 27. Differentiating (6.22) 
gives 
ou ’ 12 +¢ 

(6.35) Si(t) = 4t >> —— a. 

kel (4k°T? wn t?)8 
This shows that S;(t) > 0 when 0 < ¢ < T and hence that S,(t) is increasing 
when 0 < ¢ S T. With the aid of (6.34) and (6.33), we see that 


i/rYy , , ; xr —-4/7r\ 
(6.36) ——} = §,( <= S,(t) = S,(T) = — 
36 ) 3 (77) 1 0) = i(t = 1 T’) — (5) 


when 0 < i S T. This shows that we could delete the factor S,(¢) from the 
integrand in (6.21) without changing the order of magnitude of C“. 
We now improve the formula (6.21) by showing that 


pon 


(6.4) CY = o(n™*) + # | (t) |", (2) dt 
nN? Ww Jo 
where 6, = (log n)/n’ as in (2.22). For this purpose, let 
eile IM a sina 
(6.41) E, = —- | p(t) |"Sy(t) dt. 
n? w Jz, 

Letting y(t) = | ¢(t) |’ and using (6.36) gives, for some constant M, 

eT 
(6.42) E, < M | | w(t) |" dt. 

5n 
Since y(t) is continuous over 0 < ¢ < T, ¥(0) = 1,¥/(0) = 0,y’"(0) = —2, 


and 0 Ss ¥(t) < 1 when 0 < ¢ S 7, it follows that, when n is sufficiently great, 


GLOBAL CENTRAL LIMIT THEOREMS 


(6.43) max |¥(t)| = (dn) <1 — 6,/2. 
T 


inst 
This and (6.42) show that 
(6.44) E, < MT{1 — &,/2]" = o(n™). 


Hence (6.21) and (6.41) imply (6.4) and (6.4) is proved. 

Our estimate of CS will come from (6.4). As in Section 3 we suppose that, 
for some integer m for which m 2 3, the moments a,, and £,, exist. The formulas 
(3.3) and (3.31) are then valid and, supposing henceforth that 0 < t < 6,, 
we can use the formula 


ml ht. 


m k 
(6.5) e=14+ >" + Ow") 
k 


to obtain a formula giving e” as the sum of a finite number of terms involving 
n, t, and ys, Ys, °** , ¥m- When this finite sum is written down, it is found 
that 


(6.51) e =ut w, | e” 7a +e 


where 


_. Y4 4 3 2,6 
. 9 = — _ — “** 
(6.52) u 1 + 5 nt 79 nt + ’ 


> pe ¥3. 3 5 5 Y3¥4 27 13 3,9 
(6.53 ) pau —— ni nt — ——nt nt nore 
7 6 + i920 ma” “on 


and remarks analogous to those following (3.44) are applicable. From (6.4), 
(3.3), (6.51), and (6.34), we find that 


rbn 
(6.54) Cy” = o(n’) + 4 = eG, (t) dt 
nN? © 40 


where 


2 2 
(6.55) G,(t) = (57) E + sar f+ a nt’ + |, 


Using the values in (3.54) of the integrals in (3.5) then gives 


n® n*+! 


(6.6) ha By By + B+ o(E) 
n Ne 


where 
o 1/(ry 
(6.61) By = 6x! (5) ’ 


1 x \' v4 ( w ) 
a i arith, = 
(6.62) 60m! (5) + 96m! (a7) ° 





730 RALPH PALMER AGNEW 


and each of the constants B,, B,, B;, --- depends only upon T and a finite 
number of the semi-invariants y;, y,, --- . Unlike the constants A,, Az, --- 
in (3.6), the constant B, in (6.6) can never be zero. In case the given d.f F(x) 
has finite moments of all positive integer orders, the integer m can be chosen 
as great as we wish and (6.6) is then valid for each k = 1, 2, 3, ---. 

Our results show that if F(x) has finite moments of all positive orders, and 
if the ¢.f. @(t) is such that | ¢(¢) | is periodic, then (1.7) is valid when D, = 
A, + B, and the constants A; and B, are the constants in (3.6) and (6.6). 


7. The global version of the De Moivre theorem on binomial distributions. 
Let 0 < p < 1 and let F(x) be the binomial d-f., associated with the proba- 
bility p, which has mean 0 and standard deviation 1. To simplify our formulas, 
we define two constants h and 6 related to p and to each other by the formulas 


(7.1) h=([p/Ai-p)}, p=h/(1 +h’), 
(7.11) B= (h+h")" = [p(1 — p)f. 


A random variable — governed by F(x) has the value —h”’ with probability p 
and the value h with probability (1 — p). Hence F(x) = 0 when x < hh’, 
F(x) = pwhen —h" S x < h, and F(x) = 1 when z 2 h. A classic theorem 
of De Moivre states that, under these conditions, (1.21) holds. In case p = 4 
and F(x) is symmetric, the constant C,, in (1.3) has been estimated in [2] and 
the result is given in (1.6). We now treat the general case and shall show that 


there exists constants D,, D,., --- , depending only upon p, such that 
(7.2) C, = Dn" + Dn + --- + Din™ + O(n'*") 
for each k = 1, 2, 3, --- . Moreover 
1 2 
; 1+ .= 9 


24r!  p(1 — p) 
It is a straightforward but tedious task to extend our work to obtain explicit 
formulas for Ds and D,. 

The definition of F(x) implies that F(x) has finite moments of all positive 
integer orders and hence that (3.6) is valid. From (2.01) and the definition of 
F(x) we obtain 
ih~'t ~— tl « pe - e® "tp ii < p)e® “4. 
While ¢(t) is not necessarily periodic, we see that 

o(t)| = |[p + (1 — p) cos Bt] + a[(1 — p) sin Bt | 


= [p’ + (1 — p)? + 2p(1 — p) cos 8 rh 


(7.3) g(t) = pe 


(7.31) 


and hence that | ¢(t) | has least period 278. Therefore (6.6) is valid with 2T = 
2x8 and hence (4/27) = (28)~'. Using the notation of (3.6) and (6.6), we see 
that (7.2) is valid when D, = A;y + B,. 





GLOBAL CENTRAL LIMIT THEOREMS 731 
To obtain (7.21), we use the formula D, = A; + B,; where A; and B, are 
given by (3.61) and (6.61). Since y; = a; and 
(7.4) as = (—h')*p+h(1 — p) =h—-h" 
(7.41) as = (h’ +h — 2) = (1 — 2p)*/p(1 — p) 
we see from (3.61) that 
(7.42) A; = (240) "(4 — p)*/p(1 — p). 
Since 
(7.5) (w/2T)* = 1/48" = 1/4p(1 — p) 
we see from (6.61) that 


(7.51) B, = (240) "/p(1 — p). 


From (7.42) and (7.51) we obtain (7.21). 

When p = 3, the d.f. F(x) of this section reduces to the symmetric binomial 
or Bernoulli d.f. which we shall treat further in Section 9. In this case ¢(t) = 
cos t, | o(t) | has period x, and 27 = x. With the aid of (3.21) we obtain y; = 0 
and y, = —2. Hence (2.9), (3.6), (3.61), (3.62), (6.6), (6.61), and (6.62) 
give 


» § 3 1 1 
76) Ki ve = es ( 
(78 62! n 7 12802! n? 7? (4) 


or 
(7:61) Cy = 04903 15973n7' + .00132 23193n~? + O(n). 


8. An inequality for C,,. Throughout this section we suppose that F(x) is a 
d.f. having a finite third absolute moment §;. It is known ({5], p. 201) that 
there is an absolute cons‘ant FE, such that 
(8.1) lu.b. | F,(x) — @(x)| S E.83n (n = 1,2, 3, 

—w<cr<ce 

In the left member of (8.1) we have the distance between F(x) and ®(x) in 
the space of bounded measurable functions defined over —*x < x < ~@. It is 
not unreasonable to conjecture that, for some constant #2, a valid companion 
inequality is obtained by replacing EL; by £2 and replacing the left member of 
(8.1) by the distance C between F,(x) and #(x) in the Lebesgue space 
Lx(—«, «). While F,,(x2) and 6(2) themselves cannot belong to the space Ly , 
we know that F,(x2) — (x2) belongs to the space L. whenever F(x) has finite 
second moments and hence whenever F(x) has finite third moments. Thus we 
conjecture that there is an absolute constant FE, such that 


(8.2) Oo < EByn é’ (n = 1,2, 3, +--+). 


To eliminate the fractional exponents, we write the conjecture (8.2) in the form 





732 RALPH PALMER AGNEW 


(8.3) C, s Expin™. 


Evidence that the right member of (8.3) involves 8; and n in the correct way 
is obtained by examining the manner in which C,, depends upon 8; when F(z) 
is the binomial d.f. of Section 7. From (7.2) we obtain C,, ~ D,n™ where D, is 
defined by (7.21). From the definition of F(z) in section 7, we find that 


(8.4) Bs = [p’ + (1 — p’)|[p(1 — p)y”. 
Squaring (8.4) and using the result in (7.21) gives 

(8.5) Dy = (24m')"Q(p) 83 

where 

(8.51) Q(p) = [1 + G — p)'Ilp’ + (1 — pp). 


In the range 0 < p < 1 where p must lie, we have 5/4 < Q(p) Ss 4. More- 
over Q(4) = 4. Thus for the binomial d.f. of Section 7 
(8.6) Cy ~ (240) Q(p)Bin™ 


where 5/4 < Q(p) S 4. 
While the conjecture involving (8.3) and (8.2) remains unproved, the above 
estimates show that if (8.3) and (8.2) are universally valid, then 


, we have 


2 


(8.7) Ei = (6x) = .09403 15973 
and 


(8.71) E, 


> (62')? = .30664 57195 


9. The symmetric binomial or Bernoulli d.f. Let F(x) be the symmetric bi- 
nomial or Bernoulli d.f., this being the d.f. of Section 7 with p = 34. This df. 
is commonly associated with problems in coin tossing. It is the purpose of this 
section to obtain precise information about the constants C, defined by (1.3). 
We shall focus our attention upon the formula 


D D D; D D R 
(9.01) Ce =—+S4+S4+44+ 4+ 

n n? n§ n* n> né 
where D, , D., --- are the constants in the asymptotic expansion of C, and the 
numbers RF, , R:, --- are determined by the formula (9.01) itself. Of course the 


theory of asymptotic expansions assures us that the result of neglecting FR, in 
the right member of (9.01) gives a good approximation to C, when n is suffi- 
ciently great, but until the matter has been investigated we do not know whether 
the approximation will be good when n is 5 or 10 or 100. 

In the first place, the numerical values of D, , D2, --- , Ds can be calculated 
by the methods of Sections 3 and 6. The details of the calculations are quite 
lengthy and tedicus even when full advantage is taken of the fact that the c.f. 
¢(t) is now the real periodic function cos t. The right side of (3.2) can be re- 
placed by the known power series expansion of log cos ¢ which is obtained by 





GLOBAL CENTRAL LIMIT THEOREMS 733 


integrating the expansion of tan ¢. With the notation of (3.3), w is real and 
|e” — 1|* can be replaced by e*” — 2e” + 1. Since | cos ¢| has period x, the 
constant 27 of Section 6 is x. It is found that each of the constants D, , D., --- 
is a rational multiple of +? and that 


(9,02) D; cat = 09403 15972 57959 


3 
(9.03) D. mera 
> 1280 * 


— 397 = 
258048 
53461 Ft 
353 89440 
23 24491 
20761 80480 


— — 00132 23193 36440 


(9.04) D; = —.00086 79907 01995 


(9.05) D, — — 90085 22920 77129 


(9.06) Ds x = 00063 16664 76919 


Only one of these five constants is negative, and the author has very little in- 
formation about D, when n > 5. 

In order to obtain information about the numbers RF, , R., --- in (9.01) the 
values of C;, C2, ---, Cw in (9.1) were calculated by the method which is 
explained later in this section. 


C; = .10244 13576 r; .09506 98844 
.04706 47193 r, .04729 68250 
.03147 89023 Tr; = .03147 05293 
.02358 02730 YT, .02358 07083 
.01885 37826 T; = .01885 37765 
.01570 53613 I's .01570 53651 
.01345 79250 T; = .01345 79258 
01177 31393 é 01177 31395 
.01046 32283 .01046 32284 
Cy = .00941 56055 = .00941 56056 
The exact value of C; is 
(9.11) Cy = (2/me)' + y(1) — w* - 3 
where ¥(z) is the tabulated Gaussian function defined by (9.29) below, and 


(9.12) C; = .10244 13576 27616 





734 RALPH PALMER AGNEW 


with uncertainty only in the last figure which should perhaps be 7. For our next 
step, the ten-decimal approximations given in (9.1) are scarcely adequate. With 
the aid of the values of C,, --- , Ci which were rounded to obtain the values 
in (9.1), it is possible to calculate the numbers R,, R:, --- , Rw in (9.01). It 
is found that the numbers R;,, Rs, R;, Rs, Ro, and Ry differ relatively little 
from —.0009 and some heuristic considerations suggest very strongly that RF, 
differs from —.0009 by less than .00018 when n = 5. This in turn suggests that 
the constant I’, defined by 


(9.13) rf, «Sg SS pee ge = 

n n? n' n n' n 
must be a very good approximation to C, at least when n 2 5, and that 
(9.14) iT, — Ca| < .00018 n™ (n 2 5). 
The values of T;, T:, --- Tyo calculated from (9.13) are given in (9.1), and 


it is easy to see how I’, compares with C, when 1 S n S 8. When n is 9 or 10, 
rounding errors obscure the relationships. After the above results were obtained, 


the values of T's and Ci. were obtained correct to 15 decimal places. The values 
are 


(9.15) Tis = .00588 19417 80443 
(9.16) Cis = .00588 19417 81902, 


and the agreement is neither better nor worse than was expected. It thus ap- 
pears that C, has an exceptionally useful asymptotic expansion and that, for 
example, use of (9.13) gives 


(9.17) Cywo = .00094 04473 45108 


where the result is correct to the full 15 decimals. It would seem to be a for- 
midable task to obtain even a crude approximation to Cio by direct computa- 
tion of Cioo . 

We now proceed to obtain the formulas from which the numbers C,, C2, --- , 
Cy in (9.1) were calculated. Let H, = nC, so that C, = H,/n. Since ¢(t) = 
cos t, we find from (2.1) that 


? = as 
(9.2) H, = n'- I je" — cos” t| f° dt. 
T Jo 


According to R. J. Walker, it is not desirable to undertake to calculate H, , 
H,, --+ by direct application of a computing machine to the right member of 
(9.2); it is better to use the following way of expressing H, as a finite sum of 
terms which are tabulated or easily calculated. From (9.2) we obtain 


(9.21) H, = n'x [2R,, — Py -- Qa 


where 





GLOBAL CENTRAL LIMIT THEOREMS 


Co —nt? ad n 
(9.22) —S a, Qe [ ee 
0 


: O14 cost 

(9.23) Rk, = [ ——_——_———— dt. 
0 e 

We shall show that, for each n = 1, 2, --- 


, 


(9.24) P, = (nr)', 


pa nx (2n mw 3-5-7 --+ (2n — 1) 
alate Ons = Qu = 5 (%) = Fe 


and 
(9.26) R, = S, + T, 


where 


‘ bn 
(9.27) S, = = . (;) pre 


k=0 


(9.28) . = TaD (Z) Im — 2eL v (in — 2H n), 


and ¥(x) is the thoroughly tabulated Gaussian function 


(9.29) $2) « (Qn) [ eh? ay, 


z 


In (9.27) and (9.28), >of. can be replaced by 2 i<n/2 where the star on the 
>> signifies that when n is even the term for which k = n/2 is to be divided 
by 2. The numbers H,, are calculated from (9.21) with the aid of (9.24), (9.25), 
and (9.26) . We shall omit these calculations and hence it remains only for us 
to establish (9.24), (9.25), and (9.26). 

Starting with (9.22) and using standard integral formulas gives 


(9.3) P.=[ at | e*" dx = [a fe "at = 2 | a ‘dr =(nr)! 
0 0 


and (9.24) is established. 

We now establish (9.25) by a method which exhibits material we shall use 
to establish the more complicated formula (9.26). Using the Euler formula for 
cos t and the binomial formula we obtain, when t¢ is real, 


> tt at 
cos"t = 2 "(e" +e)" 


(9.4) . 
. rae n 7 ” ) cos (n — 2k)t 
‘ - 2” .=0 k 


and hence 


n 


(9.41) 1 — ecos't = a > (") {1 — cos (nr — 2k)t). 


k=0 \ 





736 RALPH PALMER AGNEW 


Putting this in the second of the formulas (9.22) and using a standard integral 
formula gives 


(9.42) Q = sn (2) |n — 2k]. 


While other proofs of (9.25) may be more elegant, we dismiss the matter with 
the remark that it is not difficult to use (9.42) to prove (9.25) by induction. 
To establish (9.26) we suppose that n is a fixed positive integer, put 


(9.5) G(2) = | {1 — a cos"t}t dt, 
0 
and observe that G(n) = R, and G(0) = Q, . Differentiating (9.5) gives 
- vw 1 - —zrt?/2 n 
(9.51) r(x) = z € cos’? dt. 
= “0 


Use of (9.4) gives 
] n J] a : : 
(9.52) G'(x) = 2 a | e ="? cos (n — 2k)t dt 
Qeti k=O k 0 
and use of a standard integral formula then gives 
oa ” (Qr)' Sn —} —(n—2k) 2/22 
(9.53) @(2) = Sr & (7) xe 


Defining J(m) by the formula 


(9.54) I(m) = [ gt wi dx, 
0 
we use (9.53) and the fact that G(n) = R, and G(0) = Q, to obtain 
9 } nm 
(9.6) R, = Q. + Sam. > (”) I(n — 2k). 
Q-+2 =o \k 


Our next step is to obtain a better formula for J(m). Suppose first that m ¥ 0. 
A change of the variable of integration in (9.54) gives 


hs) 


(9.61) I(m) = 2|m| tre?” dt. 
“|mi|n~4 

Using the well known formula 

(9.62) craw a's ~ tet”? dt, 


which is easily derived by intergration by parts, gives 


(9.63) I(m) = 2nbe™" — 2| m| | or a. 





GLOBAL CENTRAL LIMIT THEOREMS 737 


In case m = 0, an easy evaluation of the right members of (9.54) and (9.63) 
shows that (9.63) is still valid. Substituting (9.63) in (9.6) and using (9.42) 
gives (9.26). This completes the derivations of the formulas used to obtain 
numerical values of C,,--- , Cw and Cy . The tables of the exponential and 
probability functions put out by the U.S. National Bureau of Standards were 
used. 

While our work does not actually prove the result, it indicates very strongly 
that the sequence C, , C., C;,--- converges monotonically to zero and hence 
that, in the mean square sense, each one of the distribution functions F,(2), 
F(x), F3(x),--+ is more nearly Gaussian than its predecessors. There was a 
time when the author rather expected that the sequence H,, H,, H;,--- de- 


fined by H, = nC, would also be monotone, but it turns out that this is not so. 
In fact 


A, .10244 136 H; 09443 671 
H, = .09412 944 H, = .09432 109 


(9.7) 


As a check upon the value of H, and upon the relative values of H, , H,, and 
H, , the author, at that time, calculated H, , H: , and H, by a method completely 
independent of the calculations of this section and of the theories upon which 
they are based. By use of the distribution functions F(x), F2(x), Fs(a) and 
the formula (1.3) itself, the constants C; , C2, and C, were calculated by use of 
the Simpson parabolic formula for approximate evaluation of integrals. The 
resulting values of H, , H2 , and H, were found to agree to 6 decimal places with 


the values in (9.7). 


REFERENCES 

|1} R. P. Aenew, “Global versions of the central limit theorem,’’ Proc. Nat. Acad. Sci. 
(USA), Vol. 40 (1954), pp. 800-804. 

[2] R. P. Acnew, “Estimates for global central limit theorems,’’ Ann. Math. Stat., Vol. 
28 (1957), pp. 26-42. 

(3] Harotp Cram&r, Random Variables and Probability Distributions, Cambridge Univ. 
Press, 1937, 121 pp. 

[4] Harotp Cram&r, Mathematical Methods of Statistics, Princeton Univ. Press, 1946, 
575 pp. 

[5] C. G. Esseen, ‘‘Fourier analysis of distribution functions; a mathematical study of the 
Laplace-Gaussian law,’’ Acta Mathematica, Vol. 77 (1945), pp. 1-125. 

\6] B. V. GNEDENKOo aNnp A. N. Kotmocorov, Limit Distributions for Sums of Independent 
Random Variables, (Translation from Russian by K. L. Chung) Cambridge Univ. 
Press, 1954, 264 pp. 

(7) T. N. THe.e, Theory of Observations, London, 1903, 143 pp. 





ON THE EXISTENCE OF A BEST APPROXIMATION OF ONE 
DISTRIBUTION FUNCTION BY ANOTHER OF 
A GIVEN TYPE 


By D. L. BurKHOLDER 
University of Illinois 


1. Introduction and summary. For well over two centuries mathematicians 
have considered the conditions under which it is possible to obtain a good ap- 
proximation of one probability distribution by another of a given type. How- 
ever, the conditions which assure the existence of a best approximation of a given 
type seem to have been virtually neglected. Because of their intrinsic interest 
and because of their relevance for an estimation problem which is discussed 
later, such conditions are examined here with respect to the following example: 
Suppose that F and G are distribution functions and that an ordered real number 
pair (a, b), with a > 0, is desired such that F(axz + b) is close to G(z) for all 
real x. Is there a best pair? For instance, is there a pair (do , bo) satisfying 
(1) sup |Flaox+ bo) —G(x)| = inf sup | F(ax+b) — G(x) |? 


—e<z<@ 0<a<@wm —w<z<@ 
—2x<b<w 


In this note we give an example in which a pair (a, bo) satisfying (1) does 
not exist. We then prove two theorems each giving a simple sufficient condition 
for the existence of such a pair. One or the other condition is almost always 
satisfied in practice. For example, the first requires, merely, that both of the 
sets {x|4 S F(x) S 3} and {x|}4 S G(x) S 3} be nondegenerate. Next, we 
show that in any case if the set of minimizing pairs is nonempty then it is con- 
vex. This fact is used to obtain a fairly precise description of the set of minimizing 
pairs for the case F is increasing and continuous. In this case, simple conditions 
on G imply the uniqueness of a minimizing pair. Applications, especially to an 
estimation problem involving an unknown scale and location ps ameter, are 
then discussed. 

Throughout the paper, the right hand side of (1) is denoted by M. Also, F 
and G are understood to be continuous on the right. 


2. An example. Let F be the normal distribution function with mean 0 and 
variance 1. Let 6G(x) = 1 — (1 — e*)"’ if2 <0,=5+(1-—e%)", ifx 20. 
Here, M = 4 since M 2 3 and 


sup |F(ax) — G(x)| = supoezeiya (G(x) — F(ar)| S G(1/a) — F(O) 


which approaches 4 as a increases indefinitely. For any pair (a, b), 
sup |F(ax + 6b) — G(x)| = sup |F(axr) — G(x)| = sup (G(x) — F(axr)| > 4 
since G(x) — F(ax) approaches $ as x approaches 0 through positive values 
of x and there is an e > 0 such that the derivative with respect to x of G(x) — 
Received July 5, 1958; revised December 29, 1958. 


738 





BEST APPROXIMATION OF DISTRIBUTION FUNCTION 739 


F (az) is positive if 0 < x < «. Thusa pair (ao, bo) satisfying (1) does not exist 
here. 


3. Sufficient conditions for existence. 

THEOREM 1. Jf both of the sets {x |4 S F(x) S 3} and{x|} S G(x) S 3} are 
nondegenerate then there is a real number pair (ao , bo), with ao > 0, satisfying (1). 

Lemma. If A is a closed bounded set of positive numbers and B is a closed bounded 
set of real numbers then there is a number ay in A and a number bo in B such that 
(2) sup | F(aorx+ bo) —G(x)| = inf sup | F(axr+b) — G(x) |. 


—w<r<e aeA —we<r<e 
beB 


Proor or Lemma. Let M’ denote the right hand side of (2). There is a se- 
quence ad, a, --- in A and a sequence by), b; , --- in B such that 


sup |F(a,r + b,) — G(x)| <M’ + 1/n 


for each positive integer n and by the Bolzano-Weierstrass theorem these se- 
quences can be chosen so that 


(3) lim dn = ad, lim b, = bo. 
n?>o n>o 

Let S be the set of numbers z such that F(aor + bo) is continuous in z at z. 
If z isin S then |F(a,z + b,) — G(z)| < M’ + 1/n for each positive integer 
n and thus, by (3), |F(aez + by) — G(z)| s M’. Since S is a dense subset of 
the real numbers we have that sup |F (aor + bo) — G(x)| s M’ for z real which 
implies the desired result. 

Proor oF THEOREM 1. We first prove that M < 4. The assumptions imply 
the existence of numbers p; , po, G1, gz Such that p, < po, q <@,F(pi—-) S 
i/3 s F(p,), G(qai —) S i/3 S G(qi), i = 1, 2. Thus, each of 


F (2 = Pin 4 PG — 2) 
gz — h @—-h 


and G (x) is in the interval (0, 4] if z < q , each is in [}, 3) ifq < « < @, and 


each is in [3, 1] if g. S x. This implies that M s }. 

If M = } then the desired pair (do , bo) exists by the above paragraph. Suppose 
M < }. let M < N < }. There are numbers 7, --- , s; such that m << 
rs < 13, 8 < & < & < 8,3 S F(r,) S 3,4 S G(s;) S ¥,7 = 1, 2, and such 
that each of the numbers F(r,) — G(s), G(s) — F(t), F(rs) — G(s), 
G(s;) — F(r2) is greater than N. If (a, b) is a real number pair with a > 0 and 
at least one of the inequalities rp < as; + b, rs > ase + b, 8% < (m — b)/a, 
8; > (re — b)/a is not satisfied, or equivalently if 


(4) max {79 — a8, , T2 — as3} <b < min{r3 — ase, 71 — A8o} 


is not satisfied, then 


(5) sup |F(ar + b) — G(2)| > N. 


—wo<r<e 





740 D. L. BURKHOLDER 


For example, suppose ™ < as; + b is not satisfied. Then ro = as; + 6 and 


F(as; + b) — G(s,)| 2 G(s) — Flas; + b) = G(s,) — F(ro) > N, implying 
(5). Let co = (r2 — 11)/(83 — 8) and c, = (173 — 70) /(& — 8). ThenO < @ <q. 
Let A = [ceo, C1], do = infae, max {ro — 8; , T2 — A833}, dy) = SUPaea Min {rz — as, 
ry — aso}, and B = [dy , d,]. If (a, b) is a real number pair with a > 0 such that 
either a is not in A or b is not in B, then (a, b) does not satisfy (4) and hence 
satisfies (5). For example, if a > c then m — as; > rs — ase and (4) is not 
satisfied. Therefore, M = infaes ses SUP-cczca |F(ax + b) — G(x)| and the de- 
sired result follows from the lemma. 

THEOREM 2. Jf F is continuous and G is a step function with at most a finite 
number of discontinuity points in each bounded interval, then there is a real number 
pair (do, bo), with ao > 0, satisfying (1). 

Proor. The only case that needs to be considered here is the one for which 
there is a number s satisfying G(s— ) < 4 and 3 < G(s). The other case is taken 
care of by Theorem 1. Let 2K = G(s) — G(s—). The assumptions imply that 
K = M Ss max{K, G(s—), 1 — G(s)} and the desired pair (ap, bo) is easily 
seen to exist if M is equal to the right hand side of this expression. Suppose that 
M is less. Then there is a number N satisfying M < N < max {G(s—), 
1 — G(s)} < 4. There are numbers ro, --- , 83; such that ro < m1 < re < 73, 
8 < 8 < 8 < 8,8 <8 < &, F(r,;) = 4, F(re) = 3, G(s) = G(s—), G(s) = 
G(s) and such that each of the numbers F(r,;) — G(s), G(s3) — F(r2), and 
max {G(s—) — F(7), F(rz) — G(s)} is greater than N. It follows that if a 
pair (a,b) with a > 0 does not satisfy both as. + b <r; << as+b < rm < as; + b 
and max {r; — (as, + b), as; + b — ro} > 0, then sup |F(az + b) — G(x)| > N. 
It follows that there is a closed bounded positive number set A and a closed 
bounded real number set B such that 


M = infeea,ves SUP_xczce |F (ax + b) — G(zx)| 


and the desired result follows from the lemma. 


4. Convexity of the set of minimizing pairs. Let @ be the set of minimizing 
pairs. That is, let Q be the set of all pairs (ao , bo), with ap > 0, satisfying (1). 

THEOREM 3. If Q is nonempty, then Q is convex. 

Proor. Suppose that (c, d) and (e, f) are in Q and that 0 < A < 1. We are 
to show that (a, b) isin Q where a = (1 — A)e + Ae andb = (1.— A) d + MN. 
Ifzsatisfiescr + dS ex+fthencr +dsSar+bser+f,—-M s F(exr+d) — 
G(x) S F(ax + b) — G(x) S Flex + f) — G(x) s M, and 

lF(ax + b) — G(x)| Ss M. 
Similarly, the last inequality holds if x satisfies ce + d > ex + f. Thus, (a, b) 
is in Q. 
TuHeEorREM 4. /f F is increasing then there is a number t and a number k such that 


at + b = k for each (a, 6) in Q. If, in addition, F is continuous and Q contains 
more than one element then 


(6) \F(ax + b) — G(a)| < (G(t) — G(t—))/2 = M 








BEST APPROXIMATION OF DISTRIBUTION FUNCTION 741 


for each x # t and each (a, b) belonging to the interior of Q relative to the line that 
contains Q. 

Thus, if F is increasing then Q is a convex subset of a nonvertical line. If F 
is increasing and continuous and Q contains more than one element then ¢ is the 
unique number z maximizing G(2) — G(a—), k is the unique number satisfying 
2F(k) = G(t—) + G(t), M is easily calculated, and so forth. 

Proor. Suppose the first assertion is not true. Then either there are three 
points in Q not colinear or Q is a nondegenerate subset of a vertical line. Both 
possibilities imply, the former by the convexity of Q, that there are pairs (a, c) 
and (a, d) in Q such that c < d. Let b = (c + d)/2. Then (a, b) is in Q. Either 
sup [F(ax + b) — G(x)| = M or inf [F(axz + b) — G(x)] = —M. Suppose 
the former is true, the other case being similar. Then there is an x» such that 


sup [F(ax + b) — G(x)] = max {F(ax + b) — G(2»), 
F({ax%o + b]—) — G(xo—)}. 


Since F is increasing, the right hand side is less than the corresponding expres- 
sion with b replaced by d which in turn is less than or equal to M. Thus, 
sup [F(ax + b) — G(x)| < M, a contradiction. The first assertion follows. 

Suppose that F is continuous as well as increasing. Then, if (a, b) is in Q, 
(7) inf [F(ax + 6) — G(x)] = —M, sup [F(ax + b) — G(x)| = M. 

—x<z<eo —e<z< wo 
For suppose otherwise. For example, suppose the second relation is not true. 
Then sup [F(axz + b) — G(x)] < M and necessarily the first relation is true. 
The assumptions on F imply that the left hand side of each relation is con- 
tinuous and increasing in b. Hence, there is a number c > 6b such that 
inf [F(ax + c) — G(x)] > —M and sup [F(axr + c) — G(xr)] < M. Thus, 
sup |F(ax + c) — G(x)| < M, a contradiction. 

Now suppose that Q contains more than one point and that (a, b) is a relative 
interior point of Q. Then, if e« > 0, 


—-M < inf [F(ax+ 6) — G(2)], 
(8) jz—t|>e ; 
sup [F(ax + 6) — G(x)] < M. 
z—t\|>e 
Suppose, for example, that the second inequality is not true. Then there is an 
vo # t such that F(axo + b) — G(ao—) = M. By assumption there are pairs 
(a; , b;) and (a2, 62) in Q such that a = (a; + a.)/2 and b = (b; + be) /2. Since 
x #~ t, axo + 5 is strictly between a,x + 6b, and aev + be. Therefore, 
M 2 maxja1.2[F(aizo + 0;) — G(ao—)] > Flat + b) — G(ao—), a contra- 
diction. By (7) and (8), M = F(k) — G(t—) = G(t) — F(k). Therefore, 
2M = G(t) — G(t—) and the desired inequality in (6) follows from (8). 
Coro.uary. Suppose that F is increasing and continuous and that G is either 
(1) continuous, or (ii) a step function with n > 1 discontinuity points at which G 





742 D. L. BURKHOLDER 


has jumps of size 1/n. Then there is a unique pair (do , bo), with ap > 0, satisfy- 
ing (1). 

Remark. The condition on G is more special than need be. It could be re- 
placed by the following condition: As x varies G(x) — G(x—) assumes its 
maximum value at least twice. 

The corollary and remark are immediate consequences of Theorems | and 4. 


5. Discussion. In practically all cases of interest the results of section 3 imply 
that if G is a distribution function then there is, for example, a best normal 
approximation to G. 

We now discuss an estimation problem invoving a scale and location pa- 
rameter. Suppose that F is an increasing and continuous distribution function. 
Let n > 1 and suppose that X,,--- , X, are independent random variables 
each with the distribution function F(-; 4, 7) where F(z; 4,0) = F({x — uj/c) 
for all real x. The parameter (jy, 0) is an element of the parameter space 2 which 
is here taken to be the open upper half plane. Since F is continuous we may 
restrict ourselves to the set ¥ of sample points (2; , --- , 2.) with all coordinates 
distinct. For each (2, --- , tn) in ¥ let G(-; 2, «++ , 2,) be the corresponding 
empirical distribution function. We ask whether or not there is an estimate 
5 = (6,, &) of (u, o) such that 


sup |F(a; 6:(a%,--- , tn), 62(41,--* , 2n)) — Glatz am, -+- , tn) 


(9) —o<z< 0 


| 


=. inf sup |F(2;u,0) — G(x;m1,--:, tn)! 
(ue) eQ —w<zr<0 
for each (a, +--+: , 2.) in X. Such an estimate would be a minimum distance 


estimate in the terminology of Wolfowitz who has studied the role of the em- 
pirical distribution function in estimation in very general contexts (See [1] and 
also the references listed in [1]). The corollary of the previous section implies 
that a function 6 on X satisfying (9) does exist and is unique. One question arises. 
Is 6 measurable? It is easy to prove even more: 6 is continuous. 


REFERENCE 


[1] J. Wotrowrtz, “The minimum distance method,’’ Ann. Math. Stat., Vol. 28 (1957), 
pp. 75-88. 





BOUNDS ON THE EXPECTATION OF A CONVEX FUNCTION OF 
A MULTIVARIATE RANDOM VARIABLE 


ALBERT MADANSKY 


The RAND Corporation 


1. Introduction. Dresher has shown [2] how certain inequalities can be in- 
terpreted geometrically via the theory of moment spaces of univariate distribu- 
tions. Moment spaces of multivariate distributions will be considered, and, by 
examining the boundary of an appropriate moment space, upper and lower 
bounds on the expectation of a convex function of a vector valued random 
variable will be derived. Finally, the bounds so derived will be improved in 
the case where the elements of the random vector are independent. 


2. Moment spaces of multivariate distributions. Let ¥(z) = ¥(x1,°-- , 2,) 
be an r-variate cumulative distribution function over the bounded r-dimen- 
sional rectangle I, and let {fi(a:, --- , 27) = fi(x), i = 1,---, n} be a set of 
n continuous functions. The ith moment of ¥(2) with respect to {f;(x)} is de- 
fined to be wi(w) = fr fi(x) d(x), and the nth moment space M,, with respect 
to {f:(x)} is defined as the set of all points » = (:, --- , wn) in n-dimensional 
Euclidean space, EZ, , whose coordinates are the moments wi(W), --- , unl) 
with respect to |f;(2)} for some distribution function y(2). 

Let C, be the surface traced out in EZ, by 


(2, = f(z),t = 1,>:-,n2, cel}. 


Let H, be the convex hull of C,,, i.e., the smallest convex set containing C, . 
Then it can be shown, along the same lines as the proof of Theorem 2 of [1], 
that H,, is identical with M, , and that M,, is closed, bounded, and convex. 

In the following I shall examine g( X), some given continuous convex func- 
tion of an r-dimensional vector valued random variable X defined over the 
bounded r-dimensional rectangle I. Let C,4; be the surface traced out in E,4; 
by 2: = 21, 22 = te, °** , Sp = Ur, 2ran = G(M1,°**, Zr) = g(x). What I shall 
do is determine the boundary of H,.,, the convex hull of C,4:, from which 
inequalities on Eg( X) in terms of g( ZX) and {EX;,i = 1, --- , r} will be ob- 
tained. A point 2° is said to be on the boundary of H,,,; if and only if 2° is in 
H,., and there exists a set of real numbers Bp, 8:1, --- , 8-41 such that 

r+1 


> Bixi + Bo = 


i=l 


r+1 


> Bits + Bo = for all x in I. 
i=l 


‘ . 0: ° °¢ . 
Geometrically, x is on the boundary of H,,; if and only if there exists a sup- 


Received November 17, 1958; revised February 22, 1959. 
743 





744 ALBERT MADANSKY 
boundary of H,,,; if and only if H,4,; lies in the negative (positive) half space 
relative to the supporting hyperplane to H,4; at 2x. 


3. Boundaries of H,,,. Let I be the bounded r-dimensional rectangle de- 
fined by the 2’ vertices of the form (dig, , de, , °° , Gre,), Where 


o(t = 1,---,7) 


takes on the values 1 and 2 and ay < a, for all z, say, and let 


trai = g(M%1, °** , Zr) 
be a continuous convex function of 2, --- , x, defined over all points of I. It 
is easy to see that the set of points of the form (2, ---, 2,, g(a, °-**, 2r)) 


form the lower boundary of H,4,, since they are in H,,, and for any point 
(ai, +--+, ay, g(t1, +++, 2-)) there is a supporting hyperplane to H,,,; at that 
point, namely the plane tangent to the surface z,4, = g(a,---, 2) at that 
point. (Since g is continuous and convex, such a plane exists for each point 
(zi, °°*, Ze, g(ai,--*, x>)) in I.) It is also easy to see that the 2r hyper- 
planes 4) = ay, 21 = G2, T2 = Gn, 22 = Gm, -**, Xr = Gn, tr = Ae bound 
H,4,; on its sides. 

The upper boundary of H,,; is characterized by the following theorem, 
which is easily proved and geometrically obvious. 

THEOREM 1. The upper boundary of H,., is identical with the upper boundary 


of the convex hull of the 2” points (aig, , +--+ , Gre, » 9( ie, » *** » Ore,)). 
0 0 ° 
Let H*(x1,---,x,) be the point where the ray 
0 0 
(4%. = %1,°°° te = Xr, Tr = O|—-eo SIS @) 


intersects the upper boundary of H,4, . 
TueoreM 2. If X = (X,,--- , X,) ts a random vector such that 


PrfX eT} = 1, 


and g(X) is a continuous convex function of X over the bounded r-dimensional 
rectangle I, then 


g(BX,, «-+ - BX.) S Be(Xi,--+, X-) Ss BBR, +++, BE,). 


Proor. By the above discussion, an arbitrary point (1, --- , 241) of Hy4 
satisfies the inequality (1, ---, 2, g(t%,°°+,%)) S (M1, °°+, Ur, Mri) S 
(m1,°°:, a, H*(x,---, z-)). Since M,4, = H,41, we can take 

(25° °° 5 Dean) 
to be (ui, --* , Mrs), Where 
bi = [x dy(x) = EX,, gal, --- 
I 


roi = [ova +++, ay) db(x) = Eg( Xi, ---, Xr) 
I 


for some distribution function y(z). 





MULTIVARIATE RANDOM VARIABLE 745 


4. Discussion. One should note first of all that the left-handed inequality is 
the familiar ." inequality. 

When r = 1, let us consider the moment space M, defined by the curves 
2=%21,a= J 2), foray S 2% S a2, ay < ay. The convex hull of the points 
(an , gi an)) and (dy, g(d:2)) is the straight line joining them, and so an upper 
bound due to Edmundson [3] is obtained, namely 


Eg(X;) Ss g(a) — g(au) [EX, — ay) + g(an). 


aye — Ay 
When r = 2, a description of the upper boundary of the appropriate convex 
hull, H; , can once again be given explicitly. 
Let D = g(aun, Gn) + g(di2, Gx) — g(a, An) — gan, a2). 
(1) If D 2 O then the upper boundary of H; is the plane determined by 
(Qj, , Gn, g(@u, Gn)), (Giz, Ge, glai2, Ae)), (Qi2, On, g(Gi2, an)) for 


a — @ 
on — ea ‘seme ee 
Aye — an 


12 


Xe S ay + 


and the plane determined by (au, du, g(@u, @n)) (12, de, g(@i2, G2)), 
(ay > Aw, g( ay , Ae) ) for 


tg = dn + E= on (x; _ a). 


Ae — Ay 


(2) If D < 0 then the upper boundary of H; is the plane determined by 
(ay, Qn, g(@u, Qn)), (Qi2, Gn, g(Gi2, Qn)), (Qu, G2, g(@u, G2)) for 


Are — aa 
Qa — oe nee (x4 sie Qu) 
ay. 7 Gu 


and the plane determined by (a2, @2, g(@i2, @22)), (@i2, Qn, g(a@i2, n)), 


Are , 


(Qn, g( ay » Ax) ) for 


A222 — An 
ta — | ——— | (1 — an). 
Ae — An 


Note that for D = 0 the four points considered are coplanar. 

As an example of the use of the inequality, let r = 1, g(Xi) = e ay = 
dy = 1. Then Ee“! Ss (e — 1)EX,+ 1. As a further example, let r = 
g(X,, X2) = &*?, ay = an = 0, ae = an = 1. Then 


((e — 1)[EX, + EX:) + (e — 1)°EX:+1 if EX, = EX: 
l(e — 1)[EX, + EX: + (e — 1)°2Xi:+1 if EX:2 EX:. 


For higher dimensions I have obtained no explicit description of the upper 
e, © 0 
boundary of H,4, . However, it is — to see that one can evaluate H*(2, --- , 


x;) by finding the equations of the by perplanes formed by joining r + 1 


r+ l 
points of the form (aig,, -** , Gre, , 9(Gie,, *** » Gre,)) and then finding the maxi- 
mum value of the (r + 1)st coordinate of the points on these hyperplanes 





746 ALBERT MADANSKY 


° 0 0 ~ ° ° 0 
whose first r coordinates are (2;, --- , x-). This maximum value is H*(2, --- . 
0 
Ze). 
5. A sharper inequality.' From the above discussion of the case when r = 2, 
one sees that 


(Xi — 2 (a1. — xe 





g(Xi) Ss =. g(a a2) + ——— y 9a). 
(a2 — au) (a2 — au) 
Since g(X,, --- , X,) is convex in ad for fixed X,, --- , X,1, one can use the 
above formula to obtain 
i — Grn) . 
Ax: Det Ae : g( Xi, +++, Xr, Gre) 
(Gr2 a Ga) 
(G2 — X,) 
+ —— g( Xi, -++, Xp4, Gn). 
(a — an) 
Since g(X,,---, Xj, @j416;,,,°°* » Gre,) iS convex in X; for fixed X,, : 


X;1, one can use the above bound successively in the obvious manner to 
obtain 


Deg S96 ome s DIL(- )%i (a6, — Xj Rites: :: mat 


j=l (aj2 — Aj) 


where ¢; = 3 — ¢;. Hence, if the X,’s are independent, 


FEg(Xi,--- , Xr) s DI(- y+ (Bios — BAe hess, «+++ Wes: 


(ajo — aj) 
= H( EX), say 
In the bivariate example above, we find that 


Ee*'*** < (e — 1)[EX, + EX,] + (e — 1)°EX\EX, + 1, 


which is sharper than the bound obtained in Section 4. That this is true in general 
can be seen by noting that 


> (—1)% (aj4, — EX;) a 
@ j=l ; (aje “—~ aj1) 
and 


Il (—1)%: (ajg, — EX;) > 0. 


j=l (a;e — aj1) 


Hence the point (#X,, ---, EX,, H(EX)) lies in the convex hull of the 2” 


points (dig,,°°°, Grg,, g(dig,,°°*, Ge,)), and so H(EX) s H*(EX). 
REFERENCES 
{1} M. Dresuer, S. Karun, anno L. S. Swapiey, “Polynomial Games,’’ Ann. 
Math. Studies, No. 24 (1950), pp. 161-180. 
[2] Metvin DresHeErR, ‘‘Moment Spaces and Inequalities,’ Duke Math. Jour., Vol. 20 
(1953), pp. 261-271. 


[3] H. P. Epmunpson, ‘“‘Bounds on the Expectation of a Convex Function of a Random 
Variable,’’ The RAND Corporation, P-982, April 9, 1957. 


‘I wish to thank W. Hoeffding and the referee for their suggestions which improved 
this section. 





THE NUMBER OF COMPONENTS IN RANDOM LINEAR GRAPHS 


By T. L. Austin, R. E. Facen, W. F. Penney anp JoHn RiorpDANn 


Silver Spring, Md.; Hughes Aircraft Co., Silver Spring, Md.; 
Bell Telephone Lab. Inc. 


1. Introduction. Given n distinct points, m selections of pairs of points are 


made independently and at random, each of the ( ) possible pairs having the 


same chance (> ) of selection at each trial. Once selected, a pair is connected 


by a line joining its two points, labeled by the order number of its selection; thus, 
after m selections, a linear graph with n distinct (labeled) points and m distinct 
(labeled ) lines connecting pairs of points is formed. ( Note that the rule of forma- 
tion implies the graph contains no slings but may contain lines in parallel.) In 
many investigations it is valuable to have the distribution of the number of con- 
nected components (each isolated point being counted as a component) in such 
a random linear graph. 

In the following this distribution is found. In addition, simple closed expres- 
sions are given for a few special cases of interest, and finally, an approximation 
for the average number of components. 


2. Summary of results. Let N = (3), and let T'nm, be the number of graphs 


(as described above) with n points, m lines and p parts; then, of course, the prob- 
ability that a graph has p parts is T'nmp/N”. Let Cam = Tam be the number of 
the corresponding connected graphs (single component) with n points, m lines, 
and introduce the following ee functions: 


T(x, y, z) Zz > > Fanp = © peer = > ~ T nm(z) a 
(1) , ; 
X Taly, 2) =, n,m =0,1,---,p = 1,2,---,n 


and 

(2) C(z, y) = Sleuth Er eaty) 5, 
Then 

(3) T(x, y, 2) = explz C(z, y)] 


is the most concise expression of the relations between enumerators. Since 
Tnm(1) is N”, T(2, y, 1) is known; hence so is C(z, y), and T(z, y, z) is com- 
pletely determined by (3). 
Received July 14, 1958; revised December 16, 1958. 
747 





748 AUSTIN, FAGEN, PENNEY AND RIORDAN 


Indeed, using the abbreviation 


n! (;) 4. ke (3) + Pr +f kn (¢)T 
(4) Trux(m) = >, e o/7 3 


Ky! +++ kyl Lei2the .-- mite 


with summation over all k-part partitions of n, that is over all non-negative 
integral solutions of 


ky + 2k + --- + nk,, = 
ht+h+---+h = 


| 
3 


| 
= 


it turns out that 


(5) T daa = > Tnk(m)s(k, P), 


k=p 


n 


(6) Cam = >. (—1)*"(k — 1)! tan (m). 
k=l 
In (5), s(k, p) is a Stirling number of the first kind defined by 
(7) (z)e = 2(2 — 1) --- (2 —k +1) = > a(k, p)z’. 
It is also interesting to notice that 
(8) T.n(Z) = > (z)k Tne(m). 
k=0 


The special cases of (5) of most interest are 


(9) Tania = (n—1)!n"”, 


2 n—3 n 
(10) Tana = 3ni(n — wifi +n+ ~ Nn sal a_i |. 
2! (n— 3)! (n — 2)! 
Equation (10) depends on the following auxiliary result which is probably of 
more interest in graph theory: the number of connected linear graphs with n 
distinct points and exactly one cycle of length k, for k > 2, is (n), n” “'/2, 
while for k = 2 it is (n). nn”. This is a natural extension of the result of Cayley’s 
used in (9) that the number of (free) trees with n distinct points is n”*, and 
it is an instance of a more general result appearing in G. W. Ford and G. E. 


Uhlenbeck [1]. 
3. Derivation. Consider first the enumerator 
(11) Tote) « >. Fenel. 


As already noticed, Tn.(1) = Tnmp = N”™, since the m lines are chosen inde- 
0 ’ Pp 


9 
\= 


. : ae n 
pendently from the same population of N = ( ). 





COMPONENTS IN RANDOM LINEAR GRAPHS 749 


For orientation, the first few evaluations of (11), obtained by easy enumera- 
tions, are as follows: 


T no(z) = 


T(z) 
Tn2(z) 


Tn3(z) 


Notice that 
Tro(z) = 2Tn-10(2) 
Tra(z) = 2(Traa(z) + (m — 1)Tr-20(2)) 
Ta2o(z) = 2(Tnr2(z) + 2(m — 1)Ta-2a(z) + (nm — 1)Tr-2:0(2) 


+ 6(” ° ') Ta-sa(2)). 


The general form of the recurrence suggested by these may be derived by a 
slight modification of an argument given by E. N. Gilbert [2]. Thus in the graphs 
with n + 1 labeled vertices, m labeled lines and p parts enumerated by 7',41:,m,» , 
the vertex labeled n + 1 belongs in a connected part with 7 other points and j 
lines, while the remaining n — 7 points and m — j lines belong to a graph with 


“ mae oe . fn 
p — 1 parts. Since the labels for the 7 points and j lines may be chosen in (") 


m ; 
( ") ways, it follows at once that 
J 


n m . on 
(12) F estate = Zz (") ("") ( i+1,7 Fit int oe . 
i,j 


Multiplying by z” and summing on p, it is found that 
“ wn n\ _, ra 
(13) T n+1,m(2) = 2X (7) Cus F otuaste) } 


For boundary conditions note that T1m(z) = 250m , With d9 = 1, dom = 0,m > 0, 
and for consistency with equation (13) Tom(z) = Som, since Ci; = 69; . Note also 


that Cro = 0, n > 1, Can; = 0,7 > 1, and to verify the instances of (13) 
appearing above, C» = Cx» = 1, Cz» = 6. For concreteness, it may also be noted 
that 





750 AUSTIN, FAGEN, PENNEY AND RIORDAN 


T(z) = 2, T'2m(z) = 2, m> 0 
T(z) = 2, Tam(z) = 32 +2(3"—3), m>O. 
Multiplying (13) by y”/m! and summing on m leads to 
(14) Posslys2) = 20 (7) Cunly) Toile) 


Multiplying (14) in its turn by x”*"/(n + 1)! leads to 


ll 





(15) aT (x, y, 2) = Zz oe 22 T (2, y, 2) ° 


v0 
Ox 


Integrating with respect to x using the boundary conditions T(0, y, z) = 
C(0, y) = 0 gives (3), that is 


(3) T(x, y, 2) = explz C(2, y)). 


The further results reported above are obtained directly from (3) as follows: 
first 


T(x, y, z) = [exp C(a, y)/ 
(16) = (1 + T(z, y, 1) — 1f 


ll 


exp (z) (T(z, y, 1) —1), (z)* = (z)k. 


Next, a basic equation for the Bell multivariate polynomials (see [3], Section 
2.8) 


' ky kn 
(17) Ya(ay:, +++ ,@yn) = QU 7p raise (2) + (¥:) 
ile++ KetN\! ! 


with summation over all partitions of n, is 

(18) Zz. Y,(ay: goes , BYn) ~ = > = (xu 4 — + -- +) ; 
u! ! 2! 

Hence (16) is equivalent to (equating coefficients of x”/n!), 


(19) T»(Y; z) = Y,(aT; a1. , eT), a‘ =a = (Z)k 


with T,, = T,(y,1) = exp (yN). Using (17) for the right hand side and equating 
coefficients of y"/m! gives (8). Introducing the Stirling numbers of the first 
kind in (8) by use of (7) gives (5). 

Finally the relation (18) along with the instance z = 1 of (3) namely 
T(x, y, 1) = exp C(a, y) shows that 


(20) a. == T,(y, 1) = Y,(Ci(y), i, ae C,(y)) 


and the inverse of this (ef. [3], equation 2.51) is 
(21) Cn(y) = Ya(fT1,°°-,fTs), fi =fe = (-1)" (K-11. 


Equating coefficients of y”/m! again, gives (6). 





COMPONENTS IN RANDOM LINEAR GRAPHS 751 


4. Special cases. While the results above are formally complete, they may 
become almost impossibly difficult to write out for large n since summation is 
over all partitions. Special cases obtainable otherwise are a valuable adjunct and 
as already noted, those given by equations (9) and (10) are independently inter- 
esting in the theory of graphs. 

The number 7), .n-1,. = Cn,»-1 is the number of graphs with n labeled points, 
n — 1 labeled lines and 1 part, that is the number of free trees with all points 
and lines labeled. The lines and points are labeled independently. The number of 
free trees with all points (and no lines) labeled is n””, by Cayley’s formula, 
and the number of line labelings is (n — 1)! 

The number 7’, .»,, = C,,, is obtained in a similar way, the graphs consisting 
of a single connected part containing exactly one closed path (cycle) and with 
all points and lines labeled. The essential enumeration is of such graphs with 
cycle length k, and with all points (and no lines) labeled. 

These graphs may be enumerated by use of a theorem due to Pélya ([3], 
Chapter 6) since they may be regarded as formed by placing rooted trees at the 
vertices of the k-sided polygon formed by the cycle. Their enumerator by number 
of points and number of point labels may be written d,(z, y) = 
ai dnm(k)x"y™"/m! and by [3], problem 25 of Chapter 6, 


(16) dy(x, y) = Dy(r(a, y), r(a”), «++, r(2*)) 


with r (x, y) the enumerator of rooted trees by number of points and number of 
point labels, r (x) = r (x, 0) and Dy(t, , tz , --- &) the cycle index of the dihedral 
group: 


2Dilt, te, °°*, te) = Ze (hh, be, +++, te) +h, C 23+ 1 
= Zi (i,t, -++, hk) + Sti, y 2j 


and 


Znlti,te,-*: » i) 1S o(a)e’? 


N din 


Se = So(t,t) = (i + te) /2. 


(¢(d) is Euler’s totient function, the number of integers less than d and relatively 
prime to d, g(1) = 1, and the sum for Z, , the cycle index of the cyclic group, 
is over all divisors of n, including 1 and n). 

Making the substitution y = z/x in the definition of d(x, y) changes it to the 
form di(xz,2z) = dyo(z) + x dy(z) +--+ with dj(z) = > da+j.n(k) 2"/n!. 
Hence the numbers required, d,,.(k), are enumerated by djo(z) which is obtained 
from (16) as 


(17) doo(z) = ro(z)/2, dy(z) = ro(z)/2k, k>2 


with 


1 
ro(z) = D> Tan 2"/n! = Dn" 


n=O 





752 AUSTIN, FAGEN, PENNEY AND RIORDAN 
Noting that ro(z) = z exp ro(z) the Lagrange formula 


flu) = 70) +O 2 | f(zde"(a) | 
n=] 7 z=) 


a! | da 
with u = zp(u), gives do(z) with u = ro(z), ¢(u) = e” and f(u) = u‘ as 


dyo(z) = X ((n). n”*/2) 2"/nl, k>2 


da(z) = >. (n)en"™ 2"/n! 


and T,.2.1 = n! > me dnn(k) is obtained as in (10). 
At the other extreme, it may be noted that 


— 
> 
3 
3 
| 
re 
| 


om n eie®.__ ari 
= (3" — 3) (") + 3(2" — 2) (“) 


at, qm = om 9 n 
(4 4.3 3.2" + 12) (") 


— 
Il 


+ 10(6”" — 3” — 3.2” 


+ 15(3” — 3.2” + 3) (*) 
) 


S 

“bE 

or 
“e 
os 
ee” 


5. Average number of components. The average number of components can 
be computed directly by (3); let 1, be the average with n points and m lines, 


7m r n mn . m ’ ° 
and A.» = M,,.N',N = (3). Then if A,(y) = >> Aum y”/m!, the relation 


0 


dz 


’ x” 
T(z, Y, Z) |emt = De Aaly) 
v: 


follows. Differentiating in (3), leads to (0/dz)T (a, y, z) |r = C(x, y) T(2, y, 1), 
whence 


(22) An(y) = (C(y) + T(y))", C'(y) = Crly), T'(y) = Taly, 1). 


Recalling that T,,.(y, 1) = T, = exp (yN) and that by (21) C,(y) is expressible 
in terms of 7’; to T,, , equation (22) leads to an explicit expression for A,(y), 
namely 


(23) Aa(y) = Ya(bT1, --+, bT,), OY = by 


with b. = 1,b. = (-—1 \* (k — 2)!,k > 1 and T, as above. 


While complete, this has the disadvantage of increasing elaboration with n. 
The following alternative development is more easily adapted to asymptotic 
approximation. 





COMPONENTS IN RANDOM LINEAR GRAPHS 753 


Let S; , S2, --- , S, denote respectively the number of components which are 
single points, isolated connected pairs, isolated connected trios, etc. Then 
Mam = E(S,) + E(S2) + --- + E(S,). Now let 


Si = % + t+ ++ + an 
Se = to + 43+ -:: + 2Xn-1,n 


Sj = My...5 +e? H La—541,n—§42,--+ 0-18 


= 
Sn = Z12---n 


where 2; is 1 or 0 according as point 1 is isolated or not, x2 is 1 or 0 according 
as points 1 and 2 are connected and isolated, or not, etc.; then 


E(S,) = nE (2x) = npi 


E(S:) = (3) E(2) = (3) Piz 
B(8)) = (") Blan.) = (") pms 
iy j j 12..-j j Pi12...j 


where py:...; is the probability that points 1, 2, --- , 7 are connected and isolated. 


Then Man = npr + (5 )p + --: (")... +--+ + ("You to estimate 


the quantity M,,, it is necessary only to estimate the probabilities above. To 
illustrate one approximation which seems quite simple, suppose the approxima- 
tion is on M,,, as a function of n; first, the p’s can be estimated as follows: 


&) 


= E — ,] ~e” and E(S,) ~ ne”, 


nm 


p= V3) _po=ne—ap 


n(n — 1) 


which is exact, except for the asymptotic approximation in the last step. Next, 
Pi can be estimated by the following argument; for points 1 and 2 to be con- 
nected and isolated, they must be joined either by a single line (forming a tree 
with two labeled points), or by two lines (forming a graph with a single cycle), 
or by three or more lines. Thus 


men forae(’*) + G) meta) 
+ (3) Tos (" ay . Wty. 





754 AUSTIN, FAGEN, PENNEY AND RIORDAN 


Using (9) and (10), and noting that all except the first term result in terms 
0(1) and smaller, E(S.) + ne“ + 0(1). This argument can be continued, and 
results, in effect, in neglecting all terms which result from counting connections 
of a jtuple by more than 7 — 1 lines; the typical approximation would be 


a 
gy a») Ta 2) 
E(S;) ={. 1 at : 


eo aoe... 


oo 


for j small in comparison with n this can be further approximated by 

E(S;) = i 

1 (S;) = G—D! ne ° 
Finally, only those E(S;) need be used that are of significant size; i.e., for n < 
40, ne * < 1; to simplify, it is sufficient to take only those terms which are greater 
than 1 and estimate the total contribution of all other terms by 1 (which, in 
effect, says that the average is most heavily contributed to by one large com- 
ponent and isolated points if n < e*, and by one large component plus isolated 
points plus isolated pairs if e' <n < e°, etc.) Thus, reasonable approximations 
are 


Mi. = 1+ n/e’, n< e', 
Mu. = 1+ n/e + n/e’, e<n<e', 
Man = 1+ n/e? + n/e* + 2n/e’, e <n <e’, ete. 


The following table indicates these approximations may be satisfactory, at least 
for n of moderate size (using M,, = 1 + n/e’, values given to 3 places): 


Minn = Mean Number of Components 


n Exact Approx. Diff. Rel. Error 
3 1.111 1.406 . 295 . 265 

4 1.282 1.541 . 259 . 202 

5 1.462 1.677 215 147 

6 1.642 1.812 .170 . 103 

7 1.819 1.947 . 128 .071 

8 1.993 2.083 .090 .045 

Y 2.166 2.218 .052 .024 
10 2.336 2.353 .017 .007 


REFERENCES 
[1] G. W. Forp ano G. E. UstenseEcx, ‘‘Combinatorial problems in the theory of graphs I,”’ 
Proc. Nat. Acad. Sci. USA, Vol. 42 (1956), pp. 122-128. 
[2] E. N. Gripert, ‘Enumeration of labelled graphs,’’ Canadian J. Math., Vol. 8 (1957), 
pp. 405-411. 
[3] J. Rrorpan, An Introduction to Combinatorial Analysis, John Wiley and Sons, New 
York, 1958. 





SEQUENTIAL DESIGN OF EXPERIMENTS 


By HerRMAN CHERNOFF! 
Stanford University 


1. Introduction. Considerable scientific research is characterized as follows. 
The scientist is interested in studying a phenomenon. At first he is quite ig- 
norant and his initial experiments are preliminary and tentative. As he gathers 
relevant data, he becomes more definite in his impression of the underlying 
theory. This more definite impression is used to construct more informative 
experiments. Finally after a certain point he is satisfied that his evidence is 
sufficient to allow him to announce certain conclusions and he does so. 

While this sequential searching for relevant and informative experiments is 
common, very little statistical theory has been directed in this direction. The 
general problem may reasonably be called that of sequential design of experi- 
ments. A truncated variation of this problem called the two-armed bandit 
problem has attracted some attention (see [1] and [5]). Up to now an optimal 
solution for the two-armed bandit problem has not been attained. The failure 
to solve the two-armed bandit problem and certain obvious associated results 
indicate strongly that while optimal strategies are difficult to characterize, 
asymptotically optimal results should be easily available. Here the term asymp- 
totic refers to large samples. For the sequential design problems, large samples 
and small cost of experimentation are roughly equivalent. 

In this paper we present a procedure for the sequential design of experiments 
where the problem is one of testing a hypothesis. Formally, we assume that there 
are two possible actions (terminal decisions) and a class of available experiments. 
After each observation, the statistician decides on whether to continue experi- 
mentation or not. If he decides to continue, he must select one of the available 
experiments. If he decides to stop he must select one of the two terminal actions. 

For the special case where there are only a finite numer of states of nature and 
a finite number of available experiments this procedure will be shown to be 
“asymptotically optimal’ as the cost of sampling approaches zero. The proce- 
dure can be partially described by saying that at each stage the experimenter 
acts as though he is almost convinced that 6, the current maximum likelihood 
estimate of the state of nature, is actually equal to or very close to the true state 
of nature. 

In problems were the cost of sampling is not small, this procedure may leave 
something to be desired. More specifically, until enough data are accumulated, 
the procedure may suggest very poor experiments because it does not sufficiently 
distinguish between the cases where 6 is a poor estimate and where @ is a good 


Received July 5, 1958. 
1 This work was sponsored by the Office of Naval Research under Contract N6 onr- 
25140 (NR-342-022). 


755 








756 HERMAN CHERNOFF 


estimate. For small cost of experimentation initial bungling is relatively unim- 
portant. It is hoped and expected that with minor modifications the asymptoti- 
cally optimal procedure studied here can be adapted to problems with relatively 
large cost of experimentation. 

The procedures studied make extensive use of the Kullback-Leibler informa- 
tion numbers (see [2] and [4]). 


2. Preliminaries involving two simple hypotheses. The procedure presented 
in this paper may be motivated by an asymptotic study of the classical problem 
of sequentially testing a simple hypothesis vs. a simple alternative with only 
one available experiment. Suppose Ho:6 = 6 and H,:@ = 6, are two simple 
hypotheses, and the experiment yields a random variable x whose density is 
f(x) under H;, i = 0, 1. The Bayes strategies are the Wald sequential likeli- 
hood-ratio tests. These are characterized by two numbers A and B and consist 


of reacting to the first n observations 2 , 22, °** , Xn 
by rejecting Hy if S, 2 A, 
accepting Hy if S, s B, 


and continuing sampling as long as B < S, < A where 


(2.1) 5, i > log [fu(s)/folas)]. 


The appropriate numbers A and B are determined by the a priori probability 
w of H, , and the costs. These are the cost c per observation (which is assumed 
fixed), the loss ro due to rejecting Hy when it is true and the loss r; due to accept- 
ing Hy when it is false. The risks corresponding to a sequential strategy are 
given by 

Ro = roa + c&(N | Ho) 


R, = nB + c&(N | M1) 


where a and £ are the two probabilities of error, and N is the possibly random 
sample size. Of course A and B are determined so as to minimize 


(1 — w)Ro + wR, . 


Suppose that c approaches zero. Then A and —B are large and Wald’s ap- 
proximations [6] give 
» am 
ave, Bre, 
(2.3) 
&(N | Ho) + —B/Ip, and &(N| Mi)  A/h 


where J) and J, are the Kullback-Leibler information numbers given by 


(24) To = [tog (fol2)/f(a)I fol) az 
and 
(25) th = log (filz)/folm)1 fie) ax 








SEQUENTIAL DESIGN OF EXPERIMENTS 757 


and are assumed to exist finite and positive. Minimizing the approximation to 
(1 — w)Ro + wh, we find that 


(2.6) A & —logec + logl/iro(1 — w)/w)] & —logc, 
(2.7) B = log c + log[(1 — w)/Ioryw] © log c, 
a & we/Iyro(1 — w), Bx c(1 — w)/Iorw, 
&(N | Ho) & —log c/In, &(N | Hi) & —loge/I,, 
(2.8) Ry ~ —clogc/Iy, and R, + —cloge/],. 


Remarks: 

1. These results can be verified more rigorously by using Wald’s bounds on 
his approximations when they apply. Our later results will generalize these 
approximations of Ry and R; . 

2. The risk corresponding to the optimum strategy is mainly the cost of ex- 
perimentation. 

3. The optimum strategy and its risks depend mainly on c, J) and J; and are 
relatively insensitive to the costs ro and r; of making the wrong decision and to 
the a priori probability w. Note that doubling ro and 7; is equivalent to cutting c 
in half as far as the strategy is concerned. The consequent change in log ¢ which 
determines A and B is relatively small. That is, log c is changed to log c — log 2 
while log 2 is small compared to log c. 

Suppose that the experimenter is given a choice of one of two experiments F, 
and E, but that the one chosen must be used exclusively throughout the se- 
quential testing problem. We designate the information numbers by Jo(£;), 
I,(£;), I)( E2) and I,(E2). If c is small and Io(F1) > Io( E2) and I(4£)) > I(F£2), 
then it makes sense to select Z,. If, on the other hand, Jo(F£,) > Ip(#2) and 
I,(2£:) < I,(£:), then FE, would be preferable if Hy were “true” and FE, would 
be preferable if H,; were. Since the true state of nature is not known, there is no 
clear cut reason to prefer E; to E, without resorting to the a priori probabilities 
of Hy and H,. 

The above rather artificial problem illuminates the more natural one where, 
after each decision to continue experimentation, one can choose between EF, 
and E, . If the cost of sampling is very small, it may pay to continue sampling 
even though we are almost convinced about which is the true state of nature. 
Thus even if we feel quite sure it is Hy , we may still be willing to experiment 
further, and furthermore it would make sense to select Z, or E, by comparing 
Ip( F1) with Io(E2). 

To be more formal we may select FE, or EF, by comparing Jo(Z,) and Io(E2) 
if the maximum likelihood estimate of @ based on the previous observations is 
6) and by comparing J,(F£,) and J,(E,) if the maximum likelihood estimate is 
6,. Such a procedure may be short of optimal. On the other hand if c is very 
small, the most damage that could occur would be due to the nonoptimal choice 
of experiment for the first few of the many observations which are expected. 





758 HERMAN CHERNOFF 


The stopping rule for the single experiment case may be naturally inter- 
preted in terms of a posteriori probability as follows: there are two numbers of 
the order of magnitude of c. Stop experimenting if the a posteriori probability of 
H, goes below the first number or if the a posteriori probability of H, goes below 
the second number. The expected sample sizes are relatively insensitive to varia- 
tions in the stopping limits. More specifically, if the a posteriori probability limits 
are any numbers of the order of magnitude of c, a and 8 are of the order of magni- 
tude of c and E(N | Hy)  —log c/Ip and E(N | Hi) & —log c/h. 

In view of the standard derivations of the Bayes procedures for the one 
experiment sequential testing problem, it seems natural to stop when the a 
posteriori probability of Hy or H, go below numbers of the order of magnitude of 
c. An example of such a stopping rule is obtained by selecting A and B equal 
to —log c and log ¢ respectively. Then if Z""’ is the experiment selected on the ith 
trial and z; is the outcome let z; = log [fi(2;, EB“ )/fo(a;, E“)| where fi(x, E“”) 
and fo(x, E“’) are the densities of the outcome of E“’. Finally, after the nth 
experiment, continue sampling only if S, = Bits z; lies between B and A. 


3. A special problem involving composite hypotheses. Comparing two prob- 
abilities. In the preceding section a method was proposed for the sequential 
design problem for testing a simple hypothesis vs. a simple alternative. The 
situation becomes more complicated when the hypotheses are composite. To 
motivate our procedures for this more complex problem, we shall discuss heu- 
ristically the special problem of comparing two probabilities. This problem may 
be regarded as a prototype of the general sequential design problem for testing 
hypotheses. We shall devote our main attention to the design aspect and leave 
the stopping rule in a relatively unrefined state. 

It is desired to compare the efficacy of two drugs. The experiments /, and F, 
consist of using the first and second drugs respectively. The outcome of these 
experiments are success or failure, success having probabilities p, and ps in the 
two experiments. The two hypotheses are Hi:p; > pe and He:~p, S pro. 

After n observations consisting of n, trials of drug 1 and ne trials of drug 2, 
which led to m,; and mz successes respectively, the maximum-likelihood estimate 
of 6 = (pi, po) is given by 6. = (Pin, Pon) = (1mi/n, m2/n2). We shall select 
our next experiment according to the following idea. Consider the experiment 
which would be appropriate if we believed that @ were 6, and we were testing 
the hypothesis @ = 6, vs. the simple alternative 6 = 6, which is the “nearest”’ 
parameter point under the “alternative hypothesis.’’ To be more specific suppose 
m,/n, > m2/n2. Then 6, is an element of the set of 6 for which H, is true. The 
“nearest” element under H, is not clearly defined. For the present let us define 
it as the maximum-likelihood estimate under H,. Then 


j m + mM. Mm + Me 
m Mm + nm’ mh + Nm 


ee , N 4 ne Ne “a. -s nN . Ne . 
(: + =) m (- + =) ee (- + =) mm (; + =) in|. 








SEQUENTIAL DESIGN OF EXPERIMENTS 759 


Note that 6, is a weighted average of (pj; , pi) and (pe, pe) where the weights are 
proportional to the frequencies of EZ, and FE, . If we were testing 6 = 6, vs. 0 = 
6, and strongly believed in @ = 6, , we would select the experiment E for which 


(3.1)  I(6,,6.,£) = > log (f(z, 6,, E)/f(x, 6,, E)| f(z, 6,, E) 


is as large as possible where f(x, 6, E) is the probability of the data for experiment 
E when 6 is the value of the parameter.’ Thus 
I{(pi, Pe), (pi, pz), Fal 
(3.2) 
= p, log [pi/pr) + (1 — pi) log{(1 — pi) /(1 - pi )] 
I{(p:, pa), (pt, p2), Es) 
(3.3) 
= pz log [po/p2] + (1 — pz) log [(1 — po)/(1 — p?)). 


It is clear from these expressions and from intuitive considerations that if 
pt is close to p; , E; is relatively uninformative and if p2 is close to ps , E2 is rela- 
tively uninformative. Thus if mn is much larger than nz , 6, is close to (~; , ~:) and 

1(6,, 6,, E:) < 1(6n, 6, , Ee) 
and E, is called for. Similarly if n, is much smaller than nz, E; is called for. 
For a specified 6, , there is a unique proportion \(6,) such that the two informa- 
tions are equal if n2/(nm, + nz) = A(6,). If ne/(m, + nz) exceeds d(6,), FE, is 
called for. 

In general, the set of (pt, p:) for which 


I{(pi, pe), (pt, p2), E:)] = I[(p, po), (pt, pr), Es) 


is easy to characterize. See Fig. 1. It seems clear that after many observations 
6, will be close to @ = (p1, pz) and 6, will be close to that point 6* = @*(@) = 
(pi, p:) for which pt = p: and I(6, 6*, E,) = 1(0, 6*, E2). Furthermore the 
proportion of times that ZH, is applied in the long run is determined by the rela- 
tion of 6* to 6. That is to say if 6 = (pi, po) and &* = [(1 — A)pi + Ape, 
(1 — A)pr + Ape] then n2/(m + nz) will tend to be close to A = A(@). 

The point 6* and the ratio \ can also be interpreted from another point of 
view. Essentially @* is that point under the alternative hypothesis for which 
max, I( 0, 6*, E) is minimized. (In a sense this property can also be interpreted 
to say that 6* is the “nearest” point to @ under the alternative hypotheses. ) 
At 6*, it doesn’t matter which experiment is selected. On the other hand if we 
regard I(6, ¢, EZ) as the payoff matrix of a game and an experiment were to be 
chosen to maximize ming. 1(6, ¢, EF) [a is the set corresponding to the alter- 
native hypothesis], the randomized maximin strategy would give FE, and EF, 
weights 1 — \ and A respectively. Thus 6* and \ correspond to the solutions of a 
two person zero sum game with payoff I(@, ¢, E) where one player (say nature) 

2 We work against the ‘‘nearest’’ alternative under the intuitive assumption that this 
is the alternative which will make our risk large and which we must guard against. 





760 HERMAN CHERNOFF 


I(@, 0”, E,) > 1(6,6*,£, ) 





PT 
FIGURE 1 
The set of 0* = (p; , p2) for which Ez is preferred to E, , i.e., 
1(6, 0*, E.) > 1(6, 6*, EB) 


for a specified @ = (p; , pz) in the drug testing problem 


selects ¢ to minimize J and the other player (the experimenter) selects EF to 
maximize I. 

It seems clear that the procedure recommended before will not be substan- 
tially affected, when c is small, if it is modified so that the n + Ist experiment is 
E,; with probability 1 — , and EZ, with probability \,, where 1 — X, and AX, 
correspond to the experimenter’s maximin strategy based on the payoff matrix 
I(6,, ¢, E). 

In Table 1, we tabulate as functions of 6, 

a) 6*(6), the minimax choice of nature, 





SEQUENTIAL DESIGN OF EXPERIMENTS 


TABLE 1 


Tabulation’ of p*(@), (0), I(@) and e(@) where 
_9 = (Pi, Pm) and 0°(8) = [p*(9), p°(4)] 


01 


-0276 
-560 
-00329 
-988 


- 260 


-lll 
-980 


-363 
553 
174 
-990 





-80 


-425 
-538 
217 
- 998 





= 
| 








0S -10 











b) A(@), the proportion of times EZ, is used in the experimenter’s maximin 


strategy, 


c) I(@) = 1(6, 6*(6), E,) = I(6, 6*(@), E.), the value of the game, and 
d) e(@) = minges [1(0, ¢, Hi) + 1(6, g, H2)\/21(6, 6*(6), Hi), 
which represents a measure of the relative efficiency of using each drug half 
the time to the procedure advocated. Evidently there is no great loss of efficiency 
in using each drug half the time and this prototype example is mainly useful as 


an illustrative device. 


3 By symmetry we need only consider p; < p2 and p; + pz S 1. 





762 HERMAN CHERNOFF 


4. Formal description of the general procedures. It is desired to test 


H,:6 Ew) 
vs. the alternative H2:@ ¢ w.. There is available a set of experiments {EF} each 
of which may be replicated. Let f(z, 0, E) be the density of the outcome z of 
experiment EL with respect to a measure ys. Let 


, (x, A, E) 7 

(4.1) 1(6;, 62, E) = / log | Se: ie S| f(z, 6, E) dus(z). 

Designate the nth experiment selected by E”. Although the choice of the 
n + lst experiment may depend on the past, once it is selected, its outcome is 
assumed independent of those of the preceding experiments. The maximum- 
likelihood estimate of @ based on the first n observations is designated by 6, . 
If @ € w,, we call H, the hypothesis of 6, H2 the hypothesis alternative to 6, and 
a(@) = w, the set alternative to 6. If 6 € we , the hypothesis of @ is H» , the alter- 
native hypothesis is H; and the alternative set is a(@) = w, . Let 6, be the maxi- 
mum likelihood estimate of @ under the hypothesis alternative to 6,. If w , 
w,, or the union w; U w: are not ‘‘closed” we may have some difficulty due to 
the nonexistence of these estimates. We shall assume throughout that suitably 
closing w; and w. and, if necessary, taking 6, and 6, on the boundary of w and 
we eliminates this difficulty. In particular in the drug testing problem 8, lies on 
the boundary separating w, and we. 

Let E(0@, ¢) be any experiment which maximizes /(@, ¢, FE). We assume that 
such information maximizing experiments exist. Let us regard J(6, ¢, E) as the 
payoff matrix of a two-person zero-sum game where one player (seeking to mini- 
mize J) selects ¢ among the elements of the closure of a(@), the set alternative to 
6, and the other player (seeking to maximize J) selects FE as an element of { F}, 
the set of available strategies. Now suppose that the experiment £ is selected by 
some random mechanism corresponding to a probability measure A on {£}. 
Then the corresponding information is f 1(@, ¢, EZ) d\(£). Since experiments 
may be selected in such a fashion we extend the space of available experiments 
to include this convex set of all randomized experiments. This set is equivalent 
to the class of all randomized strategies of the second player (experimenter ) 
in the original game. Thus if the original game had randomized or pure solutions, 
the second player in the extended game has a (not necessarily unique) maximin 
strategy E(@) which is a pure or randomized experiment. The value of the game 
is given by 
(4.2) I(6) = infgeaw) 1(6, ¢, E(6)). 


Let E” represent the experiment used for the nth trial, x, the corresponding 
outcome, 





(4.3) 2n(0, 9) = log [f(an, 6, E™)/f(an, 9, E™)), 
(4.4) S,(8,¢) = 2, 2(8, ¢), 





SEQUENTIAL DESIGN OF EXPERIMENTS 


and 


(4.5) Sn - : 2:i(bn, 6,). 
t=] 


We define our procedure A as follows. Stop sampling at the nth observation 
and select the hypothesis of 6, if S, > —log c. If sampling is to be continued let 
E“*” = E(6,) which is to be defined in some unique and measurable way 
consistent with the above definition of E(@). 

The stopping rule described above is rather unrefined. It does not require 
much imagination to see how to go about refining it. On the other hand such 
refinements will not be necessary for the asymptotic results we want and so we 
shall not study them here. 


5. Asymptotic characteristics of procedure A. In this section we shall eval- 
uate the asymptotic characteristics of the procedure A of Section 4 for the case 
where there are only a finite number of states of nature and a finite number of 
available (pure) experiments. We shall show that for this procedure the prob- 
ability of making the wrong decision is O(c) and the expected sample size is 
asymptotically no larger than —log c/J(@). In the next section we shall show 
that this is as well as can be done. 

The arguments will involve the fact that as c — 0, the required sample size 
gets large. For large samples, 6, tends to be equal to @ for all but the first “few” 
observations. Then, E‘” = E(@) and for ¢ € a(@), &{z,(0, ¢)} = 110, ¢, E(0)| 
= I(@) and the sample size required for S, to reach —log c is approximately 
—log c/I(6). 

For the results in the rest of this paper we make the following assumptions 
There are s possible states of nature which are divided into the two disjoint sets w, 
and w.. The set of available (pure) experiments is |F,, E2,--- , Ex}. The losses 
due to making the wrong decision are assumed to be positive. That is, r( 6, 7), which 
is equal to the loss due to selecting H; when 0 is the state of nature, satisfies 


r(0,7) =0 if O€.a,, s= 1,2 
(5.1) 
r(0,7) >0 if @2;, i= 1,2. 
If the experiment E yields outcome x and 6 and ¢ are distinct, then 
: f(z, 6, EB) 
2(0,9, E) = log ————~ 
" : f(x, ¢, E) 


“~~ ‘ 4 
has finite variance’ and 


(5.3) I(0,¢, EZ) = / log pa f(x, 0, E) dug(x) > 0. 
f(x, ¢, E) 


4 If 2(@, ¢, E;) has finite variance for each of the finite number of pure experiments, it 
follows easily that the variance of 2(@, ¢, Z) is bounded for all randomized experiments. 
Similarly 7(@, ¢, Z) is bounded away from zero and infinity. 





764 HERMAN CHERNOFF 


Hereafter we represent the true state of nature by which we assume to be in 
w,. Unless clearly specified otherwise, all probabilities and expectations refer 
to 0 ° 

Lemma 1: If the stopping rule is disregarded and sampling is continued accord- 
ing to any measurable® procedure, 


(5.4) bn —> 0 w.p.1. 
In fact there exist K and b > 0 such that 
(5.5) P{T > n} s Ke 


where T is defined as the smallest integer such that 6, = % for n = T. 

Proor: Assign a priori probability 1/s to each state of nature. Then 6, is the 
value of 6 which maximizes the a posteriori probability p,(@) after the nth ob- 
servation. Furthermore, 


Pr(%) | _ + Heat FD] ss ot 
(5.6) log | Pe\> | = » log | Ae = Dd, %1(60, 6) = Sn(00, 6) 


and 6, = @ if S,(0, @) > O for all 6 # 4. It suffices to show that for each 
6 * 6, there is a number b such that P{S,(0, 0) < 0} < ¢°”. But 


P{Sn(O, 6) S Ojs{e"° | S.(6, 6) < O} < Ele" 





(5.7) 
P{Sn(00, 0) SO} S Bfe"""") for t <0. 


Now &{[f(a, 6, E)/f(x, 0, E)}} is the moment generating function of 
2( Oo ’ 6, E) _ log (f(z, % , E)/f(z, 6, E)) 


and is equal to one fort = —1 andt = 0 and, by convexity, is less than one for 
—1 < t <0. Thus for a randomized experiment E where £; is selected with 
probability p; , 


+ 
sig erm *\ os 7. pele “or Form 


t=1 


is bounded below one, and there is a number b > 0 such that 


—2(09,0,B)/2 —b 
2\09 } =e 


maxg &{e <s. 


Thus 


9 


—b 
|%i, Me, °° Leal SE 


ef e 2, (6,0) 
t / 
and 


P{S,(%, 0) S$ 0} s B{e*""™) < 


Lemma 2: For procedure A, the expected sample size satisfies 


(5.8) &(N) S —[1 + 0(1)] log c/I( 4). 





5 A procedure is considered measurable, if at the nth stage, the experiment selected is 


a measurable function of the data z; , z2, +--+ , tn. 





SEQUENTIAL DESIGN OF EXPERIMENTS 765 


Proor: We wish to show that, given any e > 0, there isa c* = c*(e) such that 
&(N) S —(1 + e€) logc/I(6) for c < c*. 


It is obvious that N < max,.,, (N,, T’) where N, is the smallest integer for 
which >°7.12;:(0, ¢) > —log c for all n = N,. In view of Lemma 1, it suf- 
fices to show that for each ¢ € w. and for each e > 0, there exist K = K(e, ¢) 
and b = b(e, ¢) > 0 such that 


PiN, > n} s Ke for n> —(1 + €) logc/I(6). 
To show this it suffices to prove that for each ¢ € w. and each ¢« > 0, there exist 
K and b > 0 such that 
(5.9) Pie z:(%,¢) < —log e < Ke" for n> —(1 + €) loge/I(6). 
t=1 
But 


2 2i(60, 9) = 2 [es(o, ¢) — 1(, 9, E)) 
+ 2d [I( 4%, ¢, B) =n I( 6, Y; E(@%))) + nI(, 9, E(@)). 


If « > 0, 2(%, ¢, Ei) — 1(6, ¢, Ei) + & has positive mean and finite 
moment generating function for —1 < ¢ S 0 for each ¢ and pure experiment 
E; . Hence the left-hand derivative of the moment generating function is positive 
at t = 0. Thus there is a (* = t*(e) < 0 and b; = bi(e) > O such that 


*[2(00,9,Bi)—1 (00,9 ,B5 —b 
s{e' [2(60,.¢,84)—I @o.¢ ates se". 


Consequently, 
Gf ef eto 100.9. ter) < e° 
for each ¢ and E. Then, as in our proof of Lemma 1, 


ef z roe Toren +e | 
S{e Le" }se 


and 


(5.10) Pie [2:(,¢) — I(6, 9, E)) < —an} oe: 


Furthermore, it follows from the definition of T that 
> (1 (6, 9, B®) — 1(6, ¢, E(6))}| S$ KeT 
t=1 


and hence, applying Lemma 1, 


n 


(5.11) Pie [1(%, ¢, B®) — 1(%, 9, E())] < —an} «ras. 


t=1 





766 HERMAN CHERNOFF 


Finally, 
(5.12) (0, ¢, E (60) ) = I (6) for 9 € we. 
Then combining (5.10), (5.11), and (5.12) we obtain 


( n 
PY 2. z:(0,¢) < n{I(%) — al} s K,e*" 
tel 


from which the desired result (5.9) follows. 
Lemma 3: For procedure A, the probability of errér (rejecting H,) is a = O(c). 
Proor: On the set A,, in the sample space for which we reject H, :6 € w 
at the nth observation and for which 6, = ¢ € we 


>, 2(¢, %) = > 2:( bn, 6.) = —loge 
i=l : 


tel 


and 


n 


II f(a, ®, BE) s cl] f(z, ¢, E™). 
t=] 


t=] 


gt “ / I] f(zi, %, E) dug) (2) <2 dugin) (x, ) 
Ang i=l 
sec II f(x, ¢, E’”) dugcy(2;) “* dugin)(x,). 


Ang t=1 
The last integral is the probability of the set A,, when ¢ is the state of nature. 
Thus 


a= > > PlAn} < Do c < se = O(c). 


gewe n=l ~EwW? 


This proof is rather standard and obviously applies to any measurable proce- 
dure with the same stopping rule as procedure A. 

Combining Lemmas 2 and 3 we have 

THEOREM 1: For procedure A, the risk function R(@) satisfies 
(5.13) R(0) < —[1 + 0(1)]c log c/I(6) 
for all 8. 

6. Asymptotic optimality of procedure A. We shall state and prove Theorem 
2 which together with Theorem 1 will establish the asymptotic optimality of 
procedure A in the sense described below. 

THEOREM 2: Any procedure for which I(@) > 0 and 

R(@) = O(—c log c) for all 6 

satisfies 
(6.1) R(@) = —[l + o(1)]e log c/I(@) for all 0. 

Combining Theorems 1 and 2 we see that for any procedure to do substantially 


better than A for any @ implies that its risk will be of a greater order of magnitude 
for some @. In this sense procedure A is asymptotically optimal. 





SEQUENTIAL DESIGN OF EXPERIMENTS 767 


To prove Theorem 2 we shall use two lemmas. The first will show that for the 
probabilities of error to be small enough, 


N 


(6.2) S(,¢) = Dd, 2i(%, ¢) 


t=1 


must be sufficiently large for all ¢ in we with large probability. The second will 
show that when n is substantially smaller than —log c/J(), it is unlikely that 
the sums > ?_; z;(@,¢) can be sufficiently large for all ¢ ¢ w.. 

Lema 4: If ¢ €w., Placcept H,|@ = ¢} = O(—c log c), P{reject Hy} = 
O(—c log c), and 0 < « < 1, then 


(6.3) 1S(0,¢) < —(1 — e) loge} = O(-—c'‘ loge). 


Proor: There is a number K such that 
—Kce log c 2 P {accept Hi|6 = ¢} 2 > | f(x, ¢) du(x) 
n=l “Ay, 
where f(x, ¢) is the density on the sample space when 6 = ¢ and A, is the subset 


of the sample space for which S(#,¢) < —(1 — e) log c and H, is accepted 
at the nth step. 


—Keloge = | (s(x, ¢)/f(z, 0)1 f(x, 60) du(2) 


co} 


be > | e So te Oy) du(x) = c* > P{A,}. 


n=l 


) 


> P{A,} = O(—c' log c). 


n=1 
Pt{reject H,} = O(—c log c). 
P{S(%,¢) < —(1 — © loge} S > P{A,} + P {reject H,} = O(—c' log c). 
n=l 
Lemma 5: If « > 0, 
(6.4) Pf max min >, 2;(6, ¢) n[I(6) + a} —0 as no, 
lsomsn gew: i=l 
PROOF: 


> 2i(0,¢) = D [2:(0, ¢) — I(%, 9, E)| + 2 I(%, ¢, EB) 
i=] ee t= 


_ Aim + Aom , 
where 


Aim _ 2» [z:( 0, ¢) _— I(%, ge, E™)| 





768 HERMAN CHERNOFF 


is a martingale. Now Arm = > 71 1(%,¢, EB”) represents m times the payoff 
for the game where nature selects ¢ and the experimenter selects some mixture 
of his available strategies. Thus ming, Aom S mI(6) < nI(%). Thus 


MINgews (Aim + Aom) = n{I (60) + €| 


implies that A;, = ne for some ¢ and 


Pf max min > 2:(6, ¢) = n{I(%) + a} s 8 P { max Aim 2 ne}. 
lsomsn egew, i=l Ew? lsmsn 
Since A,,, is a martingale with mean 0, we may apply Doob’s extension of Kolmo- 
goroff’s extension of Tchebycheff’s inequality ((3], p. 315), to obtain 

P{ max Ai > ne} S K/né foreach ¢ € w. 

lsmsn 
Lemma 5 follows. 
Now we are in position to prove Theorem 2. Let 


ne = —(1 — e) log c/[I(%) + é€]; 
PiNsnj} SPiNsn. and S(%,¢) 2 —(1 — e) loge for all¢ € w,} 
+ P{S(%,¢) < —(1 — €) loge for some g é w}. 


By Lemma 5, the first term on the right approaches zero. The condition R(@) = 
O(—c log c) for all @ permits us to apply Lemma 4 and the second term on the 
right approaches zero. Hence &(N) 2 —[1 + o0(1)] log c/I(@). Theorem 2 
follows. 


7. Miscellaneous remarks. 

1. The asymptotic optimality of procedure A may not be especially relevant 
for the initial stages of experimentation especially if the cost of sampling is high. 
At first it is desirable to apply experiments which are informative for a broad 
range of parameter values. Maximizing the Kullback-Leibler information num- 
ber may give experiments which are efficient only when @ is close to the esti- 
mated value. 

2. It is clear that the methods and results apply when the cost of sampling 
varies from experiment to experiment. Here, we are interested in selecting ex- 
periments which maximize information per unit cost. 

3. The ideas employed in this paper seem equally valid and applicable to 
problems which involve selecting one of k mutually exclusive hypotheses. 

4. A minor modification of the stopping rule would be to continue experimen- 
tation as long as > tal z;(6;, 6;) < —loge. This rule which involves 


7 oe z,( 6; ’ 6;) 


instead of >>7; 2:(6, , 6.) may be computationally easier to deal with occa- 
sionally. It is not difficult to show that Lemma 2 applies for this stopping rule. 
The author has not proved that Lemma 3 also applies. 





SEQUENTIAL DESIGN OF EXPERIMENTS 769 
5. A modification of the experimentation rule is the following. Select E°"*” 
so as to maximize’ J(6, , 6, , E). It is easy to see that Lemma 3 would apply 
for this or for any measurable experimentation rule. While it is expected that 
Lemma 2 would apply for some examples the example below seems to indicate 
that it should not apply in general for this modified experimentation rule. 
Note that for large samples we will have 


n + 

Min 

: 2: (0, ¢) ~n >. —— I (8, Y, E;) 

t=l1 i=l 7 

where m;, is the number of times E; is applied in the first n experiments. 
Assuming 6, = 0 € w,, 6, is that value of ¢ ¢€ w, which minimizes 


Doin 21(O , ¢). 


Thus 6, essentially minimizes > tnt (mji,/n)I (00, ¢, E;). The successive choices 
of 6, and E‘"*» correspond to the following strategies of two players of a game. 
Player 1 sees what strategy repeated n times would have been most effective 
against the combination of the past choices of player 2. Player 2 selects E‘"*” 
as though player 1 would select that most effective strategy. If for this iterative 
choice we have 


min | ; I(@, ¢, B)|/n = I(%) — o(1) 


eta: Li= 

Lemma 2 should apply. The following example shows that we can not always 
obtain the above inequality if there are more than 3 available experiments. [In 
our prototype example, it is quite clear that no such difficulty will arise.] Let 
I(6, ¢, E) be given by the following table. 


TABLE 2 
I (6%, Ys E) 





¢1 
¢2 


Then our iterative procedure will always lead to E, or FE; . In fact each will be 
used approximately half the time giving a limiting value of 5.5 for 


> 1(6, ¢;, E)/n, j= I, 2. 


t=1 


On the other hand J(@)) = 6. Clearly this modified experimentation rule is not 
as dependable as the one we chose. 


6 This rule was essentially the first one suggested in our study of the prototype example 
of Section 3. 





770 HERMAN CHERNOFF 


6. The asymptotic study of the problem of testing a simple hypothesis vs. a 
simple alternative suggests that it should be possible to refine the stopping rule 
for the composite problem. While the main term of the risk should not be affected 
the higher order terms could probably be improved. Such improvement may be 
quite important in the case where c is not very small. A refinement in the stop- 
ping rule would be relevant for problems of testing composite hypotheses even 
if the problems do not involve the ehoice of experiments. 

7. In Equation (5.3) we require that J(@, ¢, 2) > 0. This condition, used in 
the proof of consistency in Lemma 1, is not satisfied in the drug testing problem. 
There, using drug 1 will give 1(6, ¢, E:) = Oif 6 = (pi, pe) andg = (p,, p2). 
However, this condition can be relaxed, if procedure A is modified slightly to 
assure consistency. For example, let E be a specified mixture involving each of 
the pure experiments. If we use E instead of E‘” whenever n is a perfect square, 
we will have the desired consistency so long as there is an EF; for each @ and ¢ 
such that J(6, ¢, E;) > 0. Even this may be relaxed since it is not necessary to 
discriminate between @ and ¢ if they correspond to the same hypothesis. In fact, 
it suffices to have J(@) > O for each @. 


REFERENCES 

{1] R. N. Brapt, 8. M. Jounson, anv S. Kar in, ‘‘On sequential designs for maximizing 
the sum of n observations,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 1060-1074. 

(2) Herman CueErnorr, ‘‘Large-sample theory; parametric case,’’ Ann. Math. Stat., Vol. 
27 (1956), pp. 1-22. 

[3] J. L. Doos, Stochastic Processes, John Wiley and Sons, Inc., New York, 1953. 

[4] S. KuLiBack anv R. A. LerBieEr, ‘‘On information and sufficiency,’’ Ann. Math. Stat., 
Vol. 22 (1951), pp. 79-86. 

(5) H. E. Ropsrns, ‘“‘Some aspects of the sequential design of experiments,’’ Bull. Amer. 
Math. Soc., Vol. 58, No. 5 (1952), pp. 527-535. 

(6) ABRAHAM WALD, Sequential Analysis, John Wiley and Sons, Inc., New York, 1947. 





COMPLEX REPRESENTATION IN THE CONSTRUCTION 
OF ROTATABLE DESIGNS! 


By R. C. Bose anp R. L. CartTEr? 


University of North Carolina and Illinois Institute of Technology 


0. Summary. Response surface techniques are discussed as a generalization of 
factorial designs, emphasizing the concept of rotatability. It is shown that the 
necessary and sufficient conditions for a configuration of sample points to be a 
rotatable arrangement of a specified order are greatly simplified if, in the case 
of two factors, the factor space is considered as the complex plane. A theorem 
giving these conditions is proved, with an application to the conditions governing 
the combination of circular rotatable arrangements into configurations possessing 
a higher order of rotatability. This is done by showing that certain coefficients 
must vanish in the “design equation”? whose roots are the (complex) values of 
the various sample points. A method is presented by which any configuration of 
sample points (for example, some configuration fixed by extra-statistical condi- 
tions) may be completed into a rotatable design of the first order by the addition 
of only two properly chosen further sample points. 


1. Introduction. Response surface techniques are a generalization of the well- 
known factorial principle of experimental design. Since the total set of treatments 
in the conventional factorial is the set of all combinations of the factors taken 
at fixed levels, the sample points form a rectangular lattice in the factor space 
(whose dimension is the number of factors). The physical law relating the re- 
sponse with the controllable factors may be represented by a k-dimensional 
surface (taking k as the number of factors) in the (k + 1)-dimensional space 
defined by the factors and the response; this is known as a “response surface’’. 
The exploration of this response surface may often be performed more efficiently 
if the concept of the factorial design is extended to include any configuration of 
sample points whatever within the factor space. 

The requirements of experimental design in the chemical industry led to the 
work of Box and Wilson in 1951 [2]. Their problem may be stated as follows: 
suppose the true response surface, expressed as a function of the k controllable 
factors 21, %2, °°: , %, 18 


(1.1) n = $(%1,%2,°**, Le), 


i.e., the true response at the uth sample point (u = 1, 2,--- , N) is 

Received July 5, 1958; revised January 23, 1959. 

1 This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con- 
tract No. AF 18(600)-83. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 

2 Formerly of the University of North Carolina. 


771 











772 R. C. BOSE AND R. L. CARTER 


(1.2) Mu = O( Liu Zou, °** » Tew); 


where z;, is the value of the ith factor at the uth sample point. The observed 
response, ¥/, , varies about a mean of , , with a common variance of o° for all 
values of u, these N errors being uncorrelated. It is required to find, with a mini- 
mum number of experiments, a maximum or minimum of the response surface 
(i.e., an optimum set of operating conditions) within a region of interest in the 
k-dimensional factor space, which is fixed by the experimental conditions. 

It is assumed that the response surface may be represented within a given 
sub-region by its Taylor expansion to terms of order d, that is, 


(1.3) » = Boro + Biti + Bate + -*+ + Buti + +++ + Byte t+ +++ + Buel ---, 


where, in the subscript of 8, the number of times each factor-number appears is 
the appropriate power of that factor (and 2 is conventionally defined as unity). 

This problem has been further investigated by Box and Hunter in a recent 
paper [1]. The notation and terminology of the Box-Wilson paper are used, but 
the values of the z;, are subject to the scaling conventions, 


(1.4) > tu = 0, > zi. = N, for all i. 


They have obtained a general expression for the information given by a 
specified design at any point of the factor space, information being defined as 
the reciprocal of the variance of the predicted response at that point, and have 
considered the advantages of using “rotatable” designs, in which the information 
contours are hyperspheres centered at the origin of the k-dimensional sample 
space. This property is not possessed by the conventional factorial designs. 

They have shown that the necessary and sufficient condition for the design 
to be rotatable of order d is that the generating function ¢ of the moments up 
to order 2d, given by 


N 
(1.5) Q=NTD (1+ tet + tote + 8+ + taten)™, 
u=1 
should be of the form 
é 6 9 9 
(1.6) Q = Leanlti + G+ +++ +t)’, 
where a2, are constants independent of t; , 2, --- , & . Denoting the moment 
N 
(1.7) N7D>> afici? --- af 
u=l1 


by [1%, 2°*, --- , k**], they deduce by equating the coefficients tf't7* --- t* 
in (1.5) and (1.6), that for rotatability of order d it is necessary and sufficient 
that 


[1**, 2°, --- ,k™] = 0 if one or more a, are odd, 
> ’ 








CONSTRUCTION OF ROTATABLE DESIGNS 


k 
(1.8) TT a! 
= \a ——;——__ ». _ if all a; are even, 


go? I (40;) 


where a = a + a2 + -++ + a S 2d, and X, is a constant depending on a, but 
independent of the way in which a is partitioned into a , az, --+ , a, . Note that 
Xo = 1, since x» = 1, and dA, = 1 by the scaling convention. A distinction is made 
between an “arrangement” and a “design” of order d, the former being any 
configuration of sample points satisfying the necessary moment properties, 
and the latter being an arrangement which also permits estimation of the con- 
stants in the dth order model. Arrangements not having this property may, 
when properly chosen, be combined to form designs. In particular, any rotatable 


arrangement of the second order can be converted to a design by the addition of 
center points. 


2. Conditions for rotatability in terms of complex variables. In Box and 
Hunter’s paper referred to, necessary and sufficient conditions for rotatability 
are given in terms of real variables. If, however, the factor plane, whose co- 
ordinates (x and y) are the levels of the two factors, is considered as the complex 
plane, some interesting and useful results may be developed. In particular, we 
have the following theorem, which is valid for two-factor rotatable arrangements 
of any configuration whatever (not necessarily based on regular figures or circles) 
and for rotatability of any order. 

THEOREM 1. The necessary and sufficient condition that a two-factor arrangement, 
in which the uth sample point is specified by (tu, Yu), should be rotatable of order 
dis 
(2.1) za 2k = 0, 


u 
a 


for all integers a and b satisfying 0 < a S$ 2d,0 S b < a/2, where z is the complex 
variable x + iy and 2 is its complex conjugate x — iy.* 
Now from the result of Box and Hunter quoted in Section 1, the necessary 


and sufficient conditions for the design to be rotatable of order d are that the 
moment generating function 


N 
(2.2) Q= NTL + tite + tayw)™, 


should be expressible in the form 


d 
2.3) Q= Di das( ti + 6)’, 


where the constants a2, depend upon the design points, but are independent of 
t, and % . Put 


3 The present short proof was suggested by the referee. For an alternative see [4]. 





774 R. C. BOSE AND R. L. CARTER 


(2.4) Zu = Lu + Yu » zu - Zu “> Wu ’ 
‘ 2 2 2 = 6 o —i0 
(2.5) ro = fut yu = 2udu, 2, = re" R=re’* 


, 


so that (r., 6.) are the polar coordinates of the point (z, y). Similarly put 


(2.6) t=t + th, i= t — th, 

(2.7) p=tit+t&h =a, t=pe*, T= pe, 

so that (p, ¢) are the polar coordinates of the point t, , &. Now 
(2.8) 12, + tzu = (tr + tt) (tu — tyu) + (tr — the) (tu + ty) 


- tty + leYu . 
Hence from (2.2) 


= = = 2d 2 z a 
Q=N'*>D/1+ zt ) (t2,, + fz.) 


u=l a=1 


“ S[ (2d @\ srt 
(2.9) =1+> > a en 
u=1 a=1 a b+c=a b 
2d J N 
ot + > E » i (1m, Po #21) | . 


a=l1 b+c=—a u=l 
where 
2d! 
2.1 a. ae 
(2.10) eu 3 


Again from (2.3) and (2.7) 
(2.11) Q = > arp”. 


Since the constants a2, do not depend on ¢; and é and are therefore independent 
of ¢, one sees by comparing (2.9) and (2.11) that the necessary and sufficient 
condition for rotatability of order d is that the quantity 


N 
‘ ‘ b- -b 
(2.12) 2 ed ne ee | 


b+c=—a u=1 


is independent of the arbitrary angle ¢. This is satisfied if and only if 
N 


(2.13) 7 2.2 = 0, unlessb=c, (0 <b6+c=a S 2d), 


u=1 


and this is precisely what Theorem 1 states. 


3. The design equation. We now combine the results of the preceding section 
with a consideration of the elementary symmetric functions of the roots of an 
equation by which the design is specified. For any two-factor arrangement 
whatever of N sample points, whose locations in the complex factor plane are 





CONSTRUCTION OF ROTATABLE DESIGNS 775 


given by 2, 22, °-* , Zw, the design equation is defined as that equation having 
these values as roots. Thus this equation may be written 


(3.1) 0 (2 — 4) (2 — ) +++ (2 — ey) 
= 28 + pe + pe + +++ + pre + pw. 


The relation between the coefficients, p, , in this equation, and the sums of 
the powers of its roots, 8, = )..2%, is given by Waring’s formulas [3], 


(3.2) ae (1 ot, (yt 1,9, «0+, 


where t = }>..¢, and the summation is over all sets of non-negative integers 
(t,t, ---, ty) such that >>, ut, = m. Conversely, 


> 1 
(3.3) nm = ( om 1 i ‘ ae | | - 
: I] qe : 


1p” ° 


where g = >.» q and the summation is over all sets of non-negative integers 


(qi, G2, °** » Gm) Such that >>. UG = m. Alternatively, these quantities may be 
calculated by means of Newton’s recursion, 


8m + PiSm—1 + P28m—2 + ++ + Pmaisi + Pam = 0, m<QN; 
Sm + DiSm—1 + P2o8m—2 + ++ + PvSm—w = 0, m2=N. 


(3.4) 


Now by Theorem 1 the necessary and sufficient conditic 3 for any rotatable 
design of the first order are 


(3.5) s = > 2% = 0, & = > 2 = 0. 
From the formulas above, we have 


(3.6) P= —%, po = (1/2)(si — m2). 


Hence for any first-order rotatable design, both of these coefficients vanish, 
and the design equation is of the form 


(3.7) zy + ps + pa * +--+ + prize + pw = 0, 


the first two powers of z below the highest power being absent. 

This may be generalized to a rotatable arrangement of any order, d, as follows. 
By Theorem 1 all the sums of powers of z, , up to order 2d, vanish; that is, s; , 
8, °*** , S@ are all equal to zero. But (3.3) gives an expression for p,, explicitly 
as a polynomial in 8; , 8, --- , 8m. Thus p,, po, -+- , Poa are also all equal to 
zero, and we have 

THEOREM 2. A necessary condition that an arrangement be rotatable of order d is 
that the first 2d terms after the initial term in the design equation be equal to zero. 

N.B. Now when d = 1, as shown above, this condition is also sufficient, but, 





776 R. C. BOSE AND R. L. CARTER 


for greater values of d, the summations involving the complex conjugate must 
be considered also. At this point, the importance of the circular arrangement 
will become apparent. 


4. Rotatable circular arrangements. The arrangement formed by the N points 
21, 22,°°* , 2w of the complex factor plane, may be called a circular arrangement 
if the N points lie on a circle with the origin as center. If the points also form a 
rotatable arrangement of order d, i.e., satisfy the moment conditions for ro- 
tatability of order d, then they may be said to form a rotatable circular ar- 
rangement of order d. We shall now prove 

TuHeEoreM 3. If the points 2, z.,--: , Zy form a circular arrangement, then 
the necessary and sufficient conditions for them to form a rotatable circular 
arrangement of order d are 


(4.1) a = 90, for 0 < a S 2d. 
Let r be the radius of the circle on which 2 , 22, --- , zw lie. Then 
(4.2) Zuzy = 1’, “umi1.2,--- a. 


Let a and b be integers satisfying 


(4.3) 0 < a S 2d, 0 sb < a/2. 
Then 
(4.4) Dek =D ”. 


It follows from Theorem 1, that conditions (4.1) are necessary and sufficient 
for a circular arrangement to be rotatable of order d, which proves Theorem 3. 

Let (3.1) be the design equation of a circular arrangement. It follows from 
what has been shown in Section 3, that the conditions (4.1) are equivalent to 
Ph ==: =p =0. 

Hence Theorem 3 may be stated in the alternative form 

TueoreM 3A. If N points form a circular arrangement then the necessary and 
sufficient conditions for them to form a rotatable circular arrangement of order d 
are that the first 2d terms after the initial term in the design equation be equal to 
zero. 

Suppose we combine g rotatable arrangements each of order not less than d — 1, 
where the points of the wth arrangement are 


(4.5) Zwl » 2w2y °** » ZwNy » oO = 1, 2, oes 5 
Then by Theorem 1 we have 
(4.6) > do won. = 0, a =1,2,---,2(d—1),0 S$ b < a/2, 


w 


that is, the combined arrangement satisfies the conditions for rotatability up 
to order d — 1. In order that this combined arrangement shall be a rotatable 





CONSTRUCTION OF ROTATABLE DESIGNS 777 


arrangement of order d, the radii and relative orientations of the component 
arrangements must be adjusted so as to satisfy the remaining conditions, 


(4.7) > d wz. = 0, a = 2d — 1, 2d;b = 0,1,-:-,d—1. 


These remaining conditions may also be written 
(4.8) > Zz (2eubou) Sou ad 0, 


where c = a — 2b and has the values 1, 2, --- , 2d — 2, 2d — 1, 2d. Now if the 
component rotatable arrangements are also circular arrangements, the quantity 
ZwuZwu is constant for any fixed w, and u = 1,2, --- , Ny. Let 2wudou = 7. Then 
(4.8) may be written as 


(4.9) > (r? >> 2.) = 0. 


But by (4.6) these conditions are already satisfied for c = 1, 2,--- , 2d — 2. 
Hence the only further conditions are 


(4.10) > D2" =9, > <2 =0. 


Combining this result with Theorem 2 we have: 

TuHeoreM 4. Jf a number of circular rotatable arrangements of order not less than 
d — 1 are combined together, the necessary and sufficient condition that the resulting 
arrangement be a rotatable arrangement of order d is that the first 2d terms after the 
initial term in the design equation be equal to zero. 


5. Combination of first order rotatable circular designs. We can use Theorem 4 
for obtaining rotatable arrangements of the second order by combining suitable 
rotatable circular arrangements of the first order. Box and Hunter [1] have 
shown that the points of a regular ngon with center at the origin constitute a 
rotatable arrangement of the dth order if and only if n 2 2d + 1. Thus points 
of an equilateral triangle or a square (inscribed in circle with the center at origin ) 
constitute a rotatable arrangement of the first order but not of the second. 
However we can combine equilateral triangles or squares to form rotatable 
arrangements of second order. 

Suppose, for example, that we wish to combine m equilateral triangles (each 
of which is a rotatable circular arrangement of the first order) in such a way 
that the combination is a rotatable arrangement of the second order. The design 
equation of the wth equilateral triangle (w = 1, 2,--- , m) is 


(5.1) z — a, = 90. 
The design equation for the combined arrangement is 
(5.2) (2° — a) (2 — am) --+ (2 — an) = 0, 


which may be written 





778 R. C. BOSE AND R. L. CARTER 


2” eve (a + deo +--+ + Gn)z"* 


(5.3) ai 
+ (a2 + 4103 + -+* + Omidm)2”  — +++ +(—1)" aya --- an = 0. 


It follows from Theorem 4, that for the combined arrangement to be rotatable 
of the second order, we require only that 


(5.4) & + 2 +--+ + an = 0. 


For instance, if m = 2, we have a; = —a,, and thus the triangles must form a 
regular hexagon. 


Similarly, if we combine m squares, the mth square having the design equation, 
(5.5) 2‘ — ay = 0, 
the design equation for the combined arrangement is 
2" — (a, + a2 +--+ + amo” 


(5.6) a 
+ (@jQ2 + M03 + +++ + Gmidm)z” — +--+ + (—1)" am --- a, = 0. 


As before, for a rotatable arrangement of the second order, we require only that 
(5.7) a4 +a+---+a, = 0. 


Thus, to any m — 1 squares we can always add the square whose design equa- 
tion is 


(5.8) z+ (a, +a,+--: +an1) = 0, 


in order to make a rotatable arrangement of the second order. 

Since the moments of a rotatable arrangement must equal those of a spherical 
distribution [1], previous work in this field has concentrated on arrangements in 
which the sample points are equally spaced on the surface of a hypersphere (or 
combinations of such arrangements). Thus in the case of two factors only regular 
polygons have been used. One of the authors [4] has by using an iterative process 
calculated a table of circular rotatable arrangements of the first order each with 
seven points, not situated at the vertices of a regular heptagon. It is hoped to 
publish the details of the computational procedure and the table of designs as a 
separate paper. We shall now show how these arrangements may be used as 
building blocks for second order rotatable arrangements. 

Let the design equations of the three arbitrarily selected seven-point designs be 


2’ + put + pe +--+ + pu = 0, 
(5.9) z+ poe’ + pus’ +++ + px = 0, 
z+ pu! + puz + +++ + pax = 0, 


where the terms in z° and 2° are absent, in virtue of Theorem 2. These designs, 
as tabulated, have a unit radius and one sample point on the positive z-axis. 
In order to combine them in such a way that the resulting arrangements is 
second-order rotatable, we must change the radii to 7, , 72, and r;, and rotate 





CONSTRUCTION OF ROTATABLE DESIGNS 779 


the designs through the angles ¢, , ¢2 , and @; respectively. We define the complex 
variables: 


v, = (cos gd; + 7 sind), 
(5.10) V2 = T2(COS d2 + 7 Sin de), 
Vs = 13(COos o3 + 7 Sin ¢;). 


Thus the required transformation is equivalent to multiplying the roots of the 
design equations by 1, , v2 , and v; respectively, and the design equations for the 
transformed designs are 


2 + Pisdiz™ + Puliz’ +--+ Pri = 0, 
(5.11) 2’ + poo’ + pose +--+ + Pav, = 0, 
z’ + pase’ + pause +--+ + pavs = 0. 


The design equation for the combined arrangement of transformed designs is the 
product of these three equations, 


bib 2” + (pisvi + posv2 + pasd2)2™ 
2.12 


+ (pvt + powd + paws)z” +--+ + pupapavivws = 0. 


But, by Theorem 4, in order for the combined arrangement to be rotatable of 

. 18 17 . r . 
the second order, the terms in z and z’ must vanish. Thus the transformations 
must be such as to satisfy the equations 


‘ham Pisi + prsd2 + puis = 0, 
O.1e 
Pui + pads + Paws = 


These equations may be written 
12 3 3 
s P3303 = (pisi + Pav2)', 
(5.14) 3 4 ‘ 
P33 = — (puts + Po2), 


or, eliminating v; from between them, 


(5.15) (Puri + P,w2)* = — (Pai + Pxv?)* 


’ 


where Py, = pis/Pss, Pix = Pos/Pss, Pa = Pru/Pss, and Px» = py/py. The trans- 
formations corresponding to the values of v; and v2 which satisfy this equation 
will yield a rotatable arrangement of the second order; without loss of generality, 
v; may be taken as unity. 


6. Completion of designs. It frequently occurs in experimental practice that 
sample points cannot easily be taken in accordance with a prescribed design, but 
must rather be taken at locations dictated by the experimental conditions. 
Again, the statistician is often faced with the problem of analyzing data collected 
ijn a manner over which he has had no control. In such cases, it is of importance 





780 R. C. BOSE AND R. L. CARTER 


to consider applications of the methods of Section 3 to the specification of a few 
additional sample points which, combined with those already utilized, will result 
in a rotatable arrangement (thereby providing circular information contours). 

For example, suppose that N observations have been made at the points 
(1, 41), (42, Y2), --* , (2w, yw). It will be shown that we may complete this 
configuration into a rotatable design of the first order by taking observations at 
two more points, (%., ya) and (2, y). We define 


(61) A= Di(mtiyn) = Daw, B= D(utim)’= D2, 


where u = 1, --- , N. Since in the final (first-order rotatable) design we must 
have, by Theorem 1, 

(6.2) da = 0, Le = 0, (v = 1,2,---,N,a,b), 
we set -A =z, +2, —B = 2+ 2}. Thus we have: 

(6.3) Zeze = (1/2)(A® + B). 

Hence z, and z are the complex roots of the equation 

(6.4) 2 + Az + (1/2)(A* + B) = 0. 


(If the roots are equal, two observations are made at the corresponding sample 
point. ) 
REFERENCES 

[1] G. E. P. Box ano J. S. Hunter. ‘“Multi-factor experimental designs for exploring re- 
sponse surfaces,’”’ Ann. Math. Stat., Vol. 28 (1957), pp. 195-241. 

(2) G. E. P. Box ann K. B. Witson, “On the experimental attainment of optimum condi- 
tions,’’ J. Roy. Stat. Soc., (Series B), Vol. 13 (1951), pp. 1-45. 

[3] W. S. BurnsipE anv A. W. Panton. Theory of Equations (two volumes), Longmans, 
Green, and Co., London, 1904. 

[4] Rrcuarp Leston Carter, ‘“‘New designs for the exploration of response surfaces,’’ 


Technical Report No. 172, Institute of Statistics, University of North Carolina, 
1957. 





THE UNIQUENESS OF THE L, ASSOCIATION SCHEME! 


By §S. S. SHRIKHANDE 


University of North Carolina 


1. Summary. The J, association scheme for a class of partially balanced in- 
complete block designs determines the parameters of the second kind. This 
paper considers the converse problem: do these parameters imply the L: associa- 
tion scheme? Necessary conditions for the existence of such designs are also 
obtained. 


2. Introduction. A partially balanced incomplete block design [2] with two 
associate classes is said to have Lz association scheme [3], if the number of treat- 
ments is s°, where ¢ is a positive integer, and the treatments can be arranged in 
a(s X s) square such that any two treatments in the same row or the same 
column are first associates, whereas any two treatments not in the same row and 
not in the same column are second associates. The following relations are easily 
seen to hold in this case: 

(1) The number of first associates of any treatment is n, = 2s — 2. 

(2) With respect to any two treatments, 6, and 6 , which are first associates, 
the number of treatments which are first associates of both 6, and @ is 


Pi( Or , %) = 8s — 2. 


(3) With respect to any two treatments, 6; and @,, which are second asso- 
ciates, the number of treatments which are first associates of both @; and 4, is 
Dir( 9s , 6) = 2. 

We examine the converse problem, i.e., whether or not the relations (1), (2) 
and (3) imply that the association scheme is of the LZ. type. We show that the 
converse is true for s = 2, excepting possibly s = 4. Necessary conditions for the 
existence of such designs are also obtained. 

It is worthwhile to recall what is known about other partially balanced designs. 
It is known [1], that if in a partially balanced incomplete block design with two 
associate classes pi: or pi2 = 0, then the design must necessarily be a group 
divisible design. Recently Connor [5], has shown that if in a partially balanced 
incomplete block design with two associate classes v = n(n — 1)/2,n 2 9, 
nm = 2n — 4, pn = n — 2, pin = 4, then the association scheme is triangular. 
In an unpublished thesis [8], Mesner has given corresponding results for the case 
of L, designs, g = 2. The proof presented here for L, is much simpler than that 
given by him. It is also shown that when s = 4, there are only two types of 
association schemes possible. 


Received July 9, 1958; revised December 15, 1958. 

1 This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con- 
tract No. AF 18(600)-83. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 


781 








782 S. S. SHRIKHANDE 


3. Statement and proof of a lemma. 

Lemma. Let the parameters of the second kind for a partially balanced incomplete 
block design with two associate classes with s° treatments be ny = 28s — 2, pri = 
s — 2 (and hence piz = 8 — 1), pur = 2. Then if s = 2,3 ors > 4, and if the 
l-associates of any treatment @ are o; , $2, °** , Os-1, Vi, W2,°°* » Ws-1, Where 
the set (do, --* , ds-1) 18 the set of common 1-associates of both 0 and ¢, , and the 
set (Yi, +++ , Wes) ts the set of 1-associates 0, which are 2-associates of o, , then any 
two treatments from the set (di, --: , ds-1) are 1-associates. Similarly, any two 
treatments from the set (Yi, +--+ , Ws.) are 1-associates, while any treatment ¢; is a 
2-associate of any treatment Y; , 1,7 = 1,2,---,8s— 1. 

Proor. We will use the notation (@,¢) = 7 to denote that 6 and ¢ are 7-asso- 
ciates, i = 1, 2. We note that the Lemma is trivially true for s = 2. We now con- 
sider the case s = 3. Without loss of generality assume that the 1-associates of 
treatment 1 are 2, 3, 4 and 5, of which 3 is the l-associate of 2, and 4 and 5 are 
2-associates of 2. Then (1,3) = 1, and 2 is the only possible common 1-associate 
of both, and hence 4 and 5 are both 2-associates of 3. It only remains to prove 
that (4, 5) = 1. Suppose, on the contrary, that (4, 5) = 2. Then among the 
l-associates of 1, the treatment 4 has three 2-associates 2, 3, 5 contradicting the 
value pi2 (1, 4) = 2. Hence we must have (4,5) = 1. 


Now consider the case s > 4. For convenience replace 6, ¢: , d2, °-- , d-1, 
vi, We, °** » We Of the lemma by 1, 2, 3, --- , s,s + 1,8 + 2,---, (2s — 1), 
respectively. 

We then have the treatments 2, 3, --- , s,s + 1,---, (2s — 1) for 1l-asso- 
ciates of 1, of which the set 7; = (3, 4, --- , s) is the set of common 1-associates 


of both 1 and 2, whereas the set JT. = (s + 1,8 + 2, --- , (2s — 1)) is the set of 
l-associates of 1 and 2 associates of 2. Let a be any treatment of T, . Then (2, a) 
= 2. Since Pir (2, a) = 2, and 1 is one of the common 1-associates of both 2 and 
a, therefore, a has at most one 1-associate in T,. Since pu(1, a) = s — 2, a 
has at least (s — 3) 1l-associates in T, . But T, contains besides a only s-2 treat- 
ments. Hence a has at most one 2-associate in 7, . Hence, we have the following 
two possibilities. Either (i) with respect to any treatment of 7, every other 
treatment of T. is l-associate, in which case any two treatments of TJ, form a 
l-associate pair, or, (ii) there exists a treatment a of T, such that there is a 
treatment 8 of T, where (a, 8) = 2 and every other treatment of 7: besides 
a and 6 is 1-associate of a. Put T; = T: — (a, 8). Consider the treatment 8. 
Since it can have at most one 2-associate in 7’ and this is a, the set 72 is a set 
of 1-associates of 8. Thus the set 7? is the set of common 1-associates of both a 
and B where (a, 8) = 2. Treatment 1 is also a 1l-associate of both a and 8. The 
set T} and the treatment 1 give a set of (s — 2) treatments which are 1-associates 
of both a and g. But s — 2 > 2. This contradicts the fact that pii(a, 8) = 2. 
Thus this case is impossible. Hence we are left with case (i) only. 

From (i), for every a of T,, the s — 2 treatments of T, excepting a are the 
Pi: = 8 — 2 treatments which are l-associates of both 1 and a. Hence the treat- 
ment 2 and all the treatments of 7; are the (s — 1) treatments which are 2-asso- 





Ly ASSOCIATION SCHEME 783 


ciates of a. Let y be any treatment of 7,. Then (1, y) = 1 and the (s — 1) 
treatments of T, are 2-associates of y. Thus the treatments of 7; are all 1-asso- 
ciates of y. Hence any two treatments from the set 2 and 7 are 1-assoeiates. 
This completes the proof of the lemma. 


4. Statement and proof of the main theorem. 
THeEoreM 1. Jf the parameters of the second kind for a partially balanced incom- 
plete block design with s° treatments with two associate classes are given by 


1 2 ‘ 
m = 2s — 2, Pu = 8 — 2, Pu = 2, 


then the design has Lz association scheme if s = 2,3 or s > 4. 

Proor. The case s = 2 is trivial. We consider the casess = 3 ors > 4. From 
the above lemma, we can write down the l-associates of @ in the following 
scheme. 


6 1 de eee ds-1 
1 
v2 


Vet 


where any two treatments in the first row or in the first column are 1-associates, 
and any treatment ¢ is a 2-associate of any treatment y. Let 5 be any 2-associate 
of 6. We have Pir (6,5) = 2. Hence 6 cannot have more than two 1-associates in 
the set (¢@: , 62, °** , ds-1). Similarly, it cannot have more than two 1-associates 
from the set (¥:, --- , ¥s—1) and the number of 1-associates of 6 from the set of 
@; and y; is exactly 2. Suppose 6 has two 1-associates ¢; and ¢; ; then ¢; and ¢; 
have the s — 2 remaining treatments of the first row and 5 as common 1-asso- 
ciates. But this makes the number pi:(¢;, ¢;) 2 s — 1 > s — 2 which is the 
value of pj, . We thus get a contradiction. Similarly, if 6 has no 1-associate from 
the set (¢@: , --- ,¢s-1), then both these l-associates of 5 must come from the set 
¥i,°°*, We-1, Which will again give a contradiction. Thus 6 has exactly one 
l-associate from the set of ¢,’s and exactly one 1l-associate from the set of y,’s. 
Hence any 5, where (6, 6) = 2, determines uniquely a pair (¢;, ~;) such that 
(¢:, 5) = 1, (Wj, 6) = 1. Conversely we show that any pair (¢; , ¥;) uniquely 
determine a 6 such that (0,5) = 2 and (¢;,6) = (W;, 8) = 1. For suppose there 
are two such 6’s, say 5, and 6, . Then we have the following relations. 


(¢: , Ws) _ 2 
(¢:, 9) = (Wi, 0) = 1 
(¢: , 51) _ (W;, 61) oo (: , 52) - (W;, 52) = 1. 


This gives the value Pi(s , ¥;) = 3 which is a contradiction. Thus the corre- 
spondence between 6 and the pair (¢;, ¥;) is 1 to 1. We can, therefore, put 6 





784 8S. S. SHRIKHANDE 


in the position determined by the column of ¢; and row of y; . Thus the (s — 1)’ 
positions can be uniquely filled by utilizing the (s — 1)’ 2-associates of 6. We 
thus get the following scheme. 


6 hh 2 *** be 
Wi 4 52 coe On4 
2 6, 5.41 aes 52(e—1) 


We—1552_30-4-3509-30-44 pact 8(s—-1)2 


Then all the 1-associates of ¢; are exactly the treatments in the row and column 
corresponding to it. A similar result is true for any ¥;. Now consider y, . Its 
l-associates are contained in the second row and first column. Among these 
l-associates the elements ¥2,--- y,—; are the common 1-associates of ¥; and @, 
whereas 6; , 52, --- , 5,-, are the l-associates of ¥, and 2-associates of 6. Hence 
the application of the lemma gives the result that any two treatments in the 
second row are l-associates. Similarly, we get the result that any two treatments 
in the second column are l-associates. A similar result is obviously true for any 
other row or any other column. Thus for any treatment whatsoever, all its 
l-associates are obtained by taking the treatments in the row and column cor- 
responding to that treatment. Hence any two treatments which are neither in 
the same row nor in the same column are 2-associates. This completes the proof 
of the theorem. 


5. Some known results on rational equivalence of matrices and Hilbert norm- 
residue symbol. Let A and B be two symmetric matrices of order n with elements 
in the rational field. The matrices A and B are rationally equivalent, written 
A — B, if there exists a nonsingular C with elements in the same field, such 
that A = C’BC. The congruence of matrices satisfies the usual requirements of 
an “equals” relationship. 

If A is an integral symmetric matrix of order and rank n, we can always con- 
struct an integral diagonal matrix D = (d,,--- ,d,), dj; # 0,7 = 1,2,---,n, 
such that D — A. The number of negative terms i, called the index of A, is an 
invariant of A by Sylvester’s Law. 

Define d = (—1)‘5, where 6 is the square-free positive part of |A|. Then since 
|B| = |C|’ |A|, d is another invariant of A. 

Now let A be a nonsingular and symmetric integral matrix of order n. Let 
D, be the leading principal minor determinant of order r and suppose that 
D, ¥ 0, r = 1, 2, --- , n. Define 





n—l 
(5.1) ¢,(A) = (-1, -D.) I (D;, —Dj4:) 
Xu 


for every odd prime p where (m, m’), is the Hilbert norm residue symbol. Then 
we have the following results [4], [1]. 





Lz ASSOCIATION SCHEME 785 


THEoreM (A). If m and m’ are integers not divisible by the odd prime p, then 
(5.2) (m, m’), = +1 
(5.3) (m,P)p = (p, mM)» = (m/p) 
where (m/p) is the Legendre symbol. Moreover if m = m' # 0 mod p, then 
(5.4) (m, P)p = (m’, p)p. 

THEOREM (B). For arbitrary non-zero integers m, m', n, n’ and every prime p, 
(5.5) (—m, m)>, +1 
(5.6) (m,n)p = (n,m)» 
(5.7) (mm’,n)p = (m,n)p(m’, n)> 
(5.8) (mm’,m — m’), = (m, —m’),. 


From the above it is easy to verify that for p an odd prime and every positive 
integer m 


(5.9) (m,m + 1)» (-—1,m+ 1), 


(5.10) I G,i+ 1)p = ((m+ 1)!, —1)p. 


The fundamental Minkowski-Hasse Theorem states: 

TuHEeoreM (C). Let A and B be two integral symmetric matrices of order and 
rank n. Suppose that the leading principal minor determinants of A and B are all 
different from zero. Then A — B, if and only if A and B have the same invariants 
i, d, and c, for every prime p including ~. 

In the rest of this paper, “‘p” stands for an odd prime and will generally be 
omitted in writing the Hilbert norm-residue symbol. 


6. Necessary conditions for the existence of symmetrical P.B.I.B. designs 
with v = s*, n, = 2s — 2, pn = 8 — 2, pr: = 2, when s => 3 and s = 4. Con- 
sider the symmetrical design with parameters 


b= s, r= k,dy,A2, ™m = 2s — 2, m = (s — 1)? 
(6.1) pn =s—2, pe=s—1, pm = (s — 1)(s — 2) 
Pu = 2, Pi =2s8—4, pm = (s — 2)’, 82 3,8 ¥ 4. 
Then we have r(r — 1) = 2(s — 1). + (8 — 1)", or 
(6.2) r= [r + (s — 1)A] + (8 — 1) + (8 — 1)Ad). 
Let N = (n;;) be the incidence matrix of the design where 
ni; = 1 if treatment 7 occurs in block j 


0 otherwise. 








786 5. S. SHRIKHANDE 


Then by renumbering the treatments, if necessary, and using Theorem 1, 
we have 


AB :: B 
(6.3) NN' = . 2 a 7 
a a oa, Teh 


where A is an s X s symmetric matrix with r in the main diagonal and \, else- 
where and B is another s X s symmetric matrix with A, in the main diagonal 
and )» elsewhere. By a succession of elementary transformations on rows of 
NN’ considered as a partitioned matrix and the same elementary transforma- 
tion on columns of NN’ and using only the rational numbers it is easy to verify 
that 


am 0 + 0 0 ) 
0 2-3(A — B)-- 0 0 
(64) NNwT=| dnd ac : 
| 0 0 --(s—1)s(A — B) 0 | 
{ 0 0 + 0 s(A + (s — 1)B)} 
Put 
(6.5) P = (r—&) + (8 — 1)(4 — a) 
(6.6) Q=r—WAw+r 
(6.7) A =A — Ae, =A + (8 — Ire. 
Then it is easy to verify that 
(6.8) |A — B| = QP 
(6.9) |A + (s — 1)B| = rp. 
Hence 
(6.10) IT| = r(s!)*Qe Pr, 


Since NN’ is semipositive definite, so is 7. Hence we have P 2 0 and Q 2 0. 
Further |VN’| = |N|’ is a perfect square. Hence |7' is a perfect square. Thus, 
if P > 0 and Q > 0, which means that N is nonsingular, a necessary condition 
for existence of the design when s is even is that Q must be a perfect square. 
In what follows we assume that P > 0 and Q > 0. This result can also be ob- 
tained by using the results of Connor and Clatworthy [6]. 

Let 


\ 


1-2(A — B) 0 “¢ 
0 2-3(A — B) -- 0 


o 


| 
(6.11) % = | 
| 
) 


0 0 -+ (s—1)s(A — B) 





Lz. ASSOCIATION SCHEME 


and 
(6.12) T, = s(A + (8s — 1)B). 
Then 


; Ti: O 
(6.13) T = a ha 
2 


Further, if R is the (s — 1) X (s — 1) diagonal matrix 
(6.14) R = diag {1-2, 2-3, --- , (s — 1)s}. 
Then 
(6.15) T, = Rx(A — B) 


when x denotes the Kronecker product of the matrices. It is easily verified, using 
the results of section 5, that 


(6.16) IR| = ((s — 1)!)’s and 
(6.17) c(R) = 1. 


We now evaluate the values of c(A — B) and c(A + (s — 1)B). 
Following [1, p. 379] we get 


c(A — B) = (Q, —1)** (PQ, d)(P, Q)’. 
Now, since P > 0,Q > O and P — Q = sv ¥ 0 we get from (5.8) 
(PQ, ) = (PQ, P — Q)(PQ, s) 
= (P, —1)(P, Q)(P, s)(Q, s) 


c(A — B) = (Q, —1)""?"(P, Q)*"(P, -1)(2, 8)(Q, 8) 
(6.18) = (P, —1)(Q, —1)*”*(-P, Q)""(-1, Q)""(P, 8)(Q, 8) 
(P, —1)(Q, A orn —P, Q)*""(P, s)(Q, 8). 


Again following [1, p. 379] we get c(A + (s — 1)B) = (P,—1)"?"(P,N) (7, N’). 
Since r° — P = s\’ ¥ 0, 


c(A + (s — 1)B) = (P, -—1)""??(rP, W’) 
(P, —1)*?? (PP, r — P)(7'P, 8) 
(6.19) (P, —1)°°? (PP, r — P)(r’P, 8) 
= (P, -1)"""(r*, —P)(1’, 8)(P, 8) 
= (P, -1)*"""(P, 8). 
Since T; = Rx (A — B) from [9] we have 





788 S. 8. SHRIKHANDE 


C(T;) i [e(R)]*{e(A oa B)|""(\A oni Bi, = re «jy? 
(\R|, |A — BI). 
Substituting the values obtained above we get after some simplification 
(6.20) e( T:) ms (P, —j)ery —P, Q)*(P, s)*(s, —1)°%° 0", 
Similarly from [7] we have 
c(T:) = c(A + (8 — 1)B)(s, —1)***?"(s, |A + (8 — 1)B\)*™ 
(6.21) eo at 
= (P, . l )" s-- 2 Pp. s)‘(s, —1)° #+1)/2 


after some simplifications. Also we have 


(6.22) |7',| an (s odie ifr yo 
(6.23) \Te| = r’s’P*, 
Since 


is 
T= 
0 Ts 


is the direct sum of 7, and 7,, we have [1] c(7) = c(T1)e(T2)(\Ti|, |T2|). 
Substituting the values already found out it is easy to verify that 


(6.24) c(T) = (PQ, -1)™™. 
Since J ~ NN’ ~ T and c(J) = +1 for all odd prime p, we must have 
(PQ, —1);" = 1 for all odd prime p. 


If s is odd the above relation is always true. For s even we have the relation 
(PQ, —1) = lor (P, —1)(Q, —1) = 1. But when s is even, a necessary con- 
dition for existence is that Q be a perfect square. Hence we get the further neces- 
sary condition for existence, i.e.,(P, —1), = +1 for all odd prime p. We can 
thus state the following theorem. 

THEOREM 2. A necessary condition for existence of the symmetric partially 
balanced incomplete block design satisfying (6.1). 

(i) P 2 0,Q 2 0, and 

(ii) if s is even and P # 0, Q # O, then Q must be a perfect square, 
and (P, —1), = 1 for all odd prime p. 

M. N. Vartak ([{10]) considers a similar problem for a 3-associate class of 
partially balanced designs. 


7. Association scheme for the case s = 4. Consider the partially balanced 
incomplete design with the following parameters 


vy = 16, ™% = 6, Mm =9 
(7.1) pu = 2, pr =3, pe = 
Pit = 2, Diz = 4, Dre = 4 





L_ ASSOCIATION SCHEME 789 


Let (a; , a2) = 1 and a;, a, be the common |-associates of both a; and a: . 
Then we have either 


case (i) (a3,a) = 1, or 
case (ii) (az, a) = 2. 


Consider case (i). Let as, ae, a7 be the remaining 1l-associates of a, ; then 
these are obviously 2-associates of a2 , giving the following scheme: 


a4 


Now any two treatments of the first row are l-associates and hence ag , ag , a7 
will be 2-associates of a; and a,. Since az, a3, a, are 2-associates of as ; ag, a7 
must be l-associates of a; . Similarly a; , a; are l-associates of as. Hence any 
two treatments in the first column are 1l-associates. It now follows, as in the 
proof of Theorem 1, that the association scheme is of L, type. Hence if 6, and 
B: are any two treatments which are l-associates, and 8; , 8, are common 1-as- 
sociates of them both, then (8; , 8;) = 1, ie., if case (i) holds for any one pair 
of l-associates, it must hold for all such pairs. 

We now consider case (ii). Replace treatments a; , a2, +--+ , a by 1, 2,--- ,7 
for sake of convenience, giving the scheme 


1 2 3 
5 
6 
7 


Considering the pair (1, 3) and the value pi;(1, 3) = 2, we see that 3 has just 
one l-associate from the set (5, 6, 7). Without loss of generality assume that 
(3, 6) = 1 and hence (3, 5) = (3, 7) = 2. Consider the pair (3, 4). Here 1 
and 2 are 1-associates of both 3 and 4, accounting for the value pj,(3, 4) = 2. 
Hence since (3,6) = 1, (4,6) = 2. Now (6,2) = (6,4) = 2, and (6, 3) = 1. 
Hence from the values pi:(1, 6) and pj2(1, 6) we see that 6 has just one 1-asso- 
ciate and one 2-associate from the set (5, 7). Let (6, 5) = 1 and (6,7) = 2. 
Now 2, 3 and 6 are 2-associates of 7; hence considering (1, 7) = 1 and the 
values pi2(1, 7) and pu(1, 7), we see that 4 and 5 are 1-associates of 7. Now 
l-associates of 5 are 6 and 7 accounting for the value pi:(1, 5). Hence 4 must 
be a 2-associate of 5. We can summarize the above information in the following 
table, where the entry in row @ and column 8 gives the value of (a, 8), where 





790 S. 8S. SHRIKHANDE 


a ~ 6, and * along the main diagonal indicates that no treatment is either 
l-associate or 2-associate of itself. 








aA Ie Oe ee 
‘i 1 1 1 1 1 1 | 
ae * l 1 2 es 
3 I 1 * 2 2 ] 2 | 
(7.2) Sa ee fo FS on Merge 5, «4 
Te f° 2 ee ee 
7:4 2 l 2 1 Pos 
7 i ee ae 1 2 m7 


Thus with respect to treatment 2, l-associates of 1 can be exhibited in the fol- 
lowing scheme 8; : 


l 3 3 4 
5 
S.: 
6 
7 


where treatments in the same row or column are l-associates unless they are 
both “‘under’’-lined in which case they are 2-associates. Treatments not in the 
same row or column are 2-associates, unless they are both first or both second 
members of ‘‘under’’-lined pairs, in which case they are 1-associates. 

We will adopt the convention of writing down the 1-associates of any treat- 
ment 8; (here 1) with respect to any l-associate treatment 62 (here 2) in the 
scheme of the above type, which will bring out the association relationship 
amongst all the treatments involved in the scheme. 

Now amongst the treatments 1, 2,--- , 7, only the treatments 1, 3, 4 are 
l-associates of 2. Let the remaining l-associates of 2 be 8, 9, 10. Then writing 
the row 


2 ] 3 + 


we see that only one of the treatments 8, 9, 10 is a 1-associate of 3. Without loss 
of generality let (3,9) = 1. Then 9 has just one 1-associate from the set (8, 10). 
Let (9, 8) = 1, and hence (9, 10) = 2. Hence, referring to S, for comparison 
we can write down the scheme S, : 





Se ° 


We can now indicate the relations implied by S, and S, in the following table: 





L2. ASSOCIATION SCHEME 


or 


| 


no # = =| oO 
n= 
wein 


nN bo 

—_—me #£ DD WO 

wo €#— NK Nw KIO 
ewe — = bo 


l 


* 


1 2 


9 
10 


oe 
— — b> 
KX NONeKE NNW #€#N |S 





bo 
bo 


| 
| 
| 
| 


We now consider the association relationship of any treatment from the set 
(5, 6, 7) with any treatment from the set (8, 9, 10). 

Consider (2, 6) = 2. Treatments 1, 3 are common |-associates of both 2 and 
6, and pii(2, 6) = 2. Hence the remaining 1-associates of 2, i.e., 4, 8, 9, 10 are 
2-associates of 6. Thus if we combine S; and S, into a new scheme 8; : 


3 4 


we see that all the treatments of the second column are 2-associates of 6. Simi- 
larly (2,7. = 2, and 1 and 4 are common l|-associates of both. Hence 3, 8, 9, 10 
are 2-associates of 7. Hence again all the elements in the second column are 2-as- 
sociates of 7. Again (1,9) = 2; and 2, 3 are common |-associates of both. There- 
fore, the remaining 1-associates of 1 are 2-associates of 9. Hence all the treat- 
ments in the first column are 2-associates of 9. Similarly they are 2-associates 
of 10. Now (1, 8) = 2, and 3, 4, 6, 7 are 2-associates of 8, giving pi2(1, 8) = 4. 
Hence the remaining treatment, i.e., 5 must be a 1-associate of 8. 
These relations are summarized below. 


(5,8) =1, (5,9) = (5, 10) 
(7.4) (6, 8) (6,9) = (6, 10) 
(7,8) = (7,9) = (7, 10) 


A complete concise explanation of S; can, therefore, be given as follows: 
‘Treatments in the same row or column are 1-associates unless they are both 
‘under’-lined, in which case they are 2-associates. Treatments not in the same 
row or column are 2-associates unless they are both first or both second members 
of ‘under’-lined pairs, in which case they are 1-associates.”’ We utilise this method 
of combining two schemes to get new relations. 

Now among the treatments 1, 2, --- , 10, the treatments 1, 2, 6, 9 are 1-associ- 





792 S. S. SHRIKHANDE 


ates whereas 4, 5, 7, 8, 10 are 2-associates of 3. Let the remaining 1-associates of 
3 be 11 and 12. The common 1-associates of 1 and 3 are 2 and 6, where (2, 6) =2. 
Hence we write down the row 


3 1 2 6. 





Of the remaining treatments 9, 11, 12, we know that (2, 9) = 1. Hence 9 is 
placed in the third position in the column for 3. Again let (9, 11) = 1 and 
(9, 12) = 2. Then we have the scheme 


2s Se 
11 
| 9 
112. 





S4 ° 


Similarly completing the scheme for 1 3 2 6 and utilizing the relations al- 
ready obtained we have 





S, and S; can be combined into 


Beyragediy Boieg 
Bats rd 
Se: ha ell 

2 |. 


From S,, Ss , Ss we get the following relations. 


(1,11) = (1,12) =2 
(2,11) = (2,12) =2 
(3,11) = (3,12) =1 
(4,11) = (4,12) =2 

(7.5) (5,11) = (5,12) =2 
(6, 12) = 1, (6,11) =2 
(7,11) = 1, (7,12) =2 
(9,11) = 1, (9,12) =2 
(11, 12) = 1. 


Now common 1-associates of 2 and 3 are 1, 9 where (1,9) = 2. Hence utilizing 
the previous relations we have the scheme 


3 2 1 9 


12 
| 6 
11. 


S; : 





[, ASSOCIATION SCHEME 


Similarly we have S; and then S, by combining S; and Ss. 


gi ~~ gE sp 
10 

4 
\8. 
te 
12 10 
l6 (4 
ha 


Ss : 


Ss: 


8. 


S;, Ss, So give rise to the following relations. 


(8,11) = (8,12) =2 


(7.6) (10, 12) = 1, (10, 11) = 2. 


Now among the treatments 1, 2, --- , 12, the 1-associates of 4 are 1, 2, 7, 10. 
Let the remaining two 1-associates of 4 be 13 and 14. The common 1-associates 
of 4 and 1 are 2, 7, where (2,7) = 2. Writing the row 


4 1 2 7 


we see that of the remaining 1l-associates of 4 i.e., 13, 10, 14, the treatment 10 is 
l-associate of 2. Without loss of generality assume that (10, 13) = 1 and (10, 
14) = 2. Then we have the scheme 


4 


Sto : 10 


We have similarly 


Su : 


and by combining Sy and Sy 


4 
13 
110 
14 


Sx : 


From Sy , Su , Siz we get the relations 





794 S. 8. SHRIKHANDE 














(1,13) = (1,14) =2 
(2,13) = (2, 14) = 2 
(3, 13) = (3, 14) = 2 
(4,18) = (4,14) <=1 
(7.7) (5,13) = (5, 14) = 2 
(6,13) = 1, (6,14) =2 
(7,14) = 1, (7,13) =2 
(10, 13) = 1, (10, 14) = 2 
(13, 14) = 1. 
Again we can verify that the only possible schemes for rows 
4 2 1 10 and 2 4 1 10 
are 
+t 2 1 10 
14 
(7 
113 
and 
2 4 1 10 
9 
3 
8. 


Combining these into the scheme 


4 2 ] 10 





ea 
Ay I3 
13 «#8 


we have the relations 


(8,13) = (8,14) =2 
(7.8) 
(9, 14) = 1, (9,13) = 2. 
Now the 1-associates of 5 amongst the treatment 1, 2, --- , 14 are 1, 6, 7, 8. 


Hence 15 and 16 are the remaining two 1-associates of 5. The common 1-associ- 
ates of 5 and 1 are 6, 7 where (6, 7) = 2. Now 8 is known to be 2-associate of 
6 and 7. Hence 8 occupies the second position in the column for 5. Let 15 be 
l-associate of 6; hence 16, 2-associate of 6. Then we have 





LL, ASSOCIATION SCHEME 795 


1 6 7 





We also have 





and hence combining these two we get 
5 1 6 7 





wh 8 2 
Su: (15 3 
16 |4 


and we get. the following relations. 
(1,15) = (1,16) =2 
(2,15) = (2,16) =2 
(3, 15) = (3,16) =2 
(4,15) = (4, 16) = 2 
(7.9) (5, 15) = (5, 16) = |] 


(6, 15) = 1, (6, 16) = 2 
(7, 16) = 1, (7, 15) = 2 
(8,15) = (8,16) =1 


(15, 16) = 2. 


Consistent with the previous relations, it is easy to verify that we have the 
only possible schemes 


5 6 1 15 
. 16 12 
Sis 7 13 
8 13 
giving the relations 
(12, 13) = (12, 16) = 1, (12, 15) = 2 
(7.10) 
(13, 15) = 1, (138, 16) = 2 
and 
4 Te a 
10 16 
S $ 
. 2 5 





796 8. 8S. SHRIKHANDE 


giving the relations 


(10, 16) = 1 
(7.11) (11, 14) = (11, 16) = 1, (11,13) = 2 
(14, 16) = 2. 


Now counting the 1-associates and 2-associates of 12 in the previous relations 
we get 


(7.12) (12, 14) = 


Similarly counting the 1-associates and 2-associates of 9 in the previous relations 
we see that the l-associates of 9 are 2, 3, 8, 11, 14 and either 15 or 16. Now 
(7,9) = 2 and 1-associates of 7 are 1, 4, 5, 11, 14 and 16. Hence from the value 
Pii(7, 9) = 2 it is easy to see that 


(9,15) = 1 
(9, 16) = 2. 


Again counting the l-associates and 2-associates of 10 in the previous relations 
we see that 


(7.13) 


ll 


(7.14) (10, 15) = 2. 
Similarly we can verify that 

(11,15) = 2 
(7.15) 

(14, 15) = 1. 


The relations (7.2) to (7.15) give the following table of 1-associates. 


Treatment 1-associates 
1 2, 3, 4, 5, 6, 7 
2 1, 3, 4, 8, 9, 10 
3 1, 2, 6, 9, 11, 12 
4 1, 2, 7, 10, 13, 14 
5 me A 8, 15, 16 
6 1, 3, 5, 12, 18, 15 
7 1, 4, 5, i, 14, 16 
8 2, 5, 9, 10, 15, 16 
9 2, 3, 8, 11, 14, 15 
10 2, 4, 8, 12, 13, 16 
11 3, 7, 9, 12, 14, 16 
12 3, 6, 10, 11, 13, 16 
13 4, 6, 10, 12, 14, 15 
14 4, 7, 9, 13, 11, 15 
15 5, 6, 8, 9, 18, 14 
16 5, 7, 8, 10, 11, 12 





Ie ASSOCIATION SCHEME 797 


It is obvious that the association scheme for this case is unique and that the two 
common l-associates of any two treatments, which are l-associates, must be 
2-associate, for otherwise the association scheme is of L, type from case (i). 
Mesner [8] has shown that for s = 4, if we interchange the first and second 
associates in L; we get a design with parameters (7.1). The association scheme 
for case (ii) must therefore be of the same type as obtained from Mesner’s 
result. 

We now give an example due to Mesner [8], to show that there actually exists 
a design for s = 4, which has the association scheme of case (ii) described above. 
Consider the following Latin Square 


A B C D 
B Cc D A 
C D A B 
D A B C 


which has the property that there exists no Latin Square of side 4 which is 
orthogonal to it. Superimposing the above Latin Square on the square array 


1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16 


and forming blocks corresponding to the rows, columns, and letters of the 
Latin square we get the following twelve blocks for 16 treatments: (1, 2, 3, 4), 
(5, 6, 7, 8), (9, 10, 11, 12), (13, 14, 15, 16), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 
11, 15), (4, 8, 12, 16), (1, 8, 11, 14), (2, 5, 12, 15), (3, 6, 9, 16), (4, 7, 10, 13). 
Any two treatments either do not occur together in any block (in which case 
they are l-associates), or they occur together exactly in one block (in which 
case they are 2-associates). It is easily verified that the design is a partially 
balanced design with two associate classes with v = 16, b = 12,r = 3, k = 4, 
m = 6, m2 = 9,» = 0, A» = 1, pu = 2, pi2 = 3, pn = 2. It is easy to see that 
(1, 6) = 1, and their common 1-associates are 12 and 15 where (12, 15) = 2. 
Hence the association scheme of this design is not of L, type. It must, therefore, 
correspond to the association scheme of case (ii). 


REFERENCES 


[1] R. C. Bose anp W.S. Connor, ‘“‘Combinatorial properties of group divisible incomplete 
block designs,’”’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 

[2] R. C. Bose ann K. R. Narr, “Partially balanced incomplete block designs,’’ Sankhyd, 
Vol. 4 (1939), pp. 337-372. 

[3] R. C. Bose anp T. Saimamoro, ‘Classification and analysis of partially balanced 
incomplete block designs with two associate classes,’’ J. Amer. Stat. Assn., 
Vol. 47 (1952), pp. 151-184. 

[4] R. H. Bruck anp H. J. Ryser, ‘‘Nonexistence of certain finite projective planes,’’ 
Canadian Jour. Math., Vol. 1 (1949), pp. 88-93. 





798 S. S. SHRIKHANDE 


[5] W. S. Connor, ‘‘The uniqueness of the triangular association scheme,’’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 262-266. 

(6) W. 8S. Connor anv W. H. Ciatworrny, ‘Some theorems for partially balanced de- 
signs,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 100-112. 

[7] B. W. Jones, The Arithmetic Theory of Quadratic Forms, John Wiley and Sons, New 
York, 1950. 

[8] D. M. Mesner, ‘‘An investigation of certain combinatorial properties of partially 
balanced incomplete block designs and association schemes, with a detailed study 
of designs of Latin square and related types,’’ unpublished thesis, Michigan State 
University, 1956. 

[9] M. N. Varrax, ‘On the Hasse-Minkowski invariant of the Kronecker product of 
matrices,’’ Canadian Jour. Math., Vol. 10 (1958), pp. 66-72. 

(10) MANowar Narwar VarTAk, “The non-existence of certain PBIB designs,” to be pub 
lished in Ann. Math. Stat. 





COMBINING INTER-BLOCK AND INTRA-BLOCK INFORMATION 
IN BALANCED INCOMPLETE BLOCKS 


By Franxkuin A. GRAYBILL AND Davip L. WEEKS 


Oklahoma State University 


0. Introduction. When an Eisenhart Model III [1] (blocks random, error 
random) is assumed in a balanced incomplete block (BIB), two independent 
estimates of treatment differences have been exhibited by Yates [5]. A com- 
bined estimate of treatment differences has also been set forth by Yates but 
none of the properties of the combined estimate have been given. 

It is the purpose of this paper to show that Yates’ combined estimate is based 
on a set of minimal sufficient statistics. A form of the combined estimate is set 


forth in the paper which is shown to be unbiased and which is also based on a 
set of minimal sufficient statistics. 


1. The model. The balanced incomplete block is defined as a design in which 
t treatments are applied to b blocks of k < ¢ cells per block with r replicates 
per treatment with every pair of treatments occurring in all blocks an equal 
number ()) of times. 

The mathematical model may be formulated as a special case of a two way 
classification model with unequal numbers in the cells (3). 


(1) Yign = a+ Bit te + Ciem 
where i = 1,2, ---,b6;q = 1,2,---,t; andm = 0,1, ---, nig. 

In a BIB, ni, is equal to 1 if the gth treatment is in block 7 and equal to 0 
if the gth treatment does not occur in block 7 and if n;, = 0 the corresponding 


observation does not exist, i.e., yig does not exist for any 7 and g. 
We can now deduce: 


23 Nig = 13 >> nig = k; Dd Nia = bk = tr = n; 
‘ q t@ 


> Nighig? = A for all g ¥ q’. 


2. Distributional assumptions. We will assume an Eisenhart model III with 
the block effects (8;) and errors (ejgm) normally and independently distributed 
with the following properties (Z will denote mathematical expectation) : 


E(€ien) = 0 E(CiqnsCre) =o if t=r, gq=s, and m=! 
0 otherwise 
E(B;) E(8;-8.) = 05 if i=s 
0 otherwise 
E(B ree) = 0 for all , r, s, and t. 


Received May 20, 1958; revised November 27, 1958. 
799 





800 FRANKLIN A. GRAYBILL AND DAVID L. WEEKS 


3. Matrix model. (1) represents n equations and they may be written in 
matrix notation as, 


(2) Y=a(Ji)+L'‘8+ D'r*¥+e 


where 8 is a vector with elements 8; and r* a vector with elements , and 
where we will consider the n components of the vector Y ordered on the block 
subscript first and then the treatment subscript. The dimensions of the ma- 
trices in (2) are: Y(n X 1), L’/(n X b), B(b XK 1), D’(n X t), r*(t & 1), and 
e(n X 1). We will let (J) denote an n X m matrix with every element equal 
to 1. 

We will now reparameterize by constructing a ¢t X ¢ orthogonal matrix P* 
which has the property that every element in the first column is equal to 1/+/2. 
Then rewriting (2) we have 


Y = a(J?) + L’B + D’P*P*'s* + «. 
By partioning P* into ((1/+/t)Ji, P) we will obtain 
(3) Y=yn(J1)+L/8+A'r+e 


where » = a + (1/t) >>, 7¢, A’ = D’P, r = P’r* where A’ isn X (t — 1) 
and ris (tf — 1) X 1. Each element of 7 is now an estimable function (contrast) 
of the 7}. 

In addition we will define N to be the t X b matrix 


— ne '* Se 
m2 Nea *': Me 
Mme Mae *** Moe 


The following relationships can be shown to hold: 


wat UPne . ot ~ Fai; 


LL’=kI, L'(Ji)=(Jt), (Js)L’ = k(Ji); 
DD’ =rI, D(Ji) =(JT), (J4)D! = r(J2); 
NN’ = (r—YA)I+XJi), DL’ =N,  (Js)N’ = r(J), 
(Ji)N = k(J3). 


We will use the following definitions: 
If B; = Doom Yigm, then (LY)’ = (BB, --- By) = B’ 
If T, = Ds nigBi, then (NLY)’ = (1172 --- T:) = T’ 
If Vz = Doim Yiem, then (DY)’ = (ViV2--- Vi) = V’ 


1fQ,=Vi—1T,, then | (> - t wx) v| = (@a--Q =@ 





COMBINING INFORMATION IN BIB 


and where 
1 
ioe, Tal DM, ZV 
q q 
mY Yin - 


4. Minimal sufficient statistics. In this section we will exhibit a set of mini- 
mal sufficient statistics for the t + 2 parameters 7, 72, --- 
o in (3). 

We will first find the distribution of the vector Y. Y is distributed as the n- 
variate normal with 


(a) mean equal to 
a= E(Y) = Elu(Ji) + L'B + A’r + | = w(JT) + A’r. 
(b) and variance-covariance matrix equal to 
== E(Y — EY)(Y — EY)’ 
= E(L'B + «)(¢ + BL) 


ol + (42 “) ko 


i sil * 
and Z* = wl — (“ = ) ut 


2 
» Tt-1, BM, OB, and 


k 
where w = o , w* = (0° + kos) 
The joint density of the yigm is then 
a 1 —}(Y—p)’ 2 ~-1(Y—A) 
” JY) = Garey 


The quadratic form in the exponent of (4) is equal to (neglecting the —}) 


[Y — p(t) — A's} E - wo L|tv Sukie) «iil 


It can be shown that the quantity may be put in the following form: 


,* * —_ 
Y’(I — D’R)'L'LU — D’R)Y + roe (+ — P’RY)'(r — P’RY) 


+ w*bk(y --- —p)’ + w¥’(I — D’G)’ ¢ - tUL) (I — D'G)Y 


+ mr P’'GY)'(r — P’'GY), 


where we define the matrices R and G by 


om 1 , oa t Y k - int) 
R= +,[put rit) G = K(p ; NL). 





802 FRANKLIN A. GRAYBILL AND DAVID L. WEEKS 


We are now able to define 2¢ + 1 statistics, namely, P’'GY, P’RY, y---, 
(1/k)¥’UI — D/R)L/LUI — D’R)Y, and 
Y’U — P’G)(T — (1/k)L’L) UI — P’G)Y 


which are seen to be sufficient for the t + 2 parameters in (3). 
To simplify the notation in the ensuing discussion we will let 


U=PGY X= P'RY 
S* = (1/k)Y’(I — D’R)L/LU — D’'R)Y 
S’ = Y'(I — D’G)(I — (1/k)L’L) (I — D’G)Y. 


By applying the operation defined on page 328 of [4] it can be shown that the 
2¢ + 1 sufficient statistics as defined above form a minimal set. 


5. Distributional properties of the minimal sufficient statistics. The ¢ — 1 X | 


vector U = (u,) is distributed normally with mean 7 and covariance matrix 
(k/At)o"I, so u; is an unbiased (intra-block) estimate of 7; with variance 
(k/At)o’. 
The ¢ — 1 vector X = (z;) is distributed normally with mean 7 and co- 


variance matrix (k/(r — ))(o*° + kog)I so x; is an unbiased estimate (inter- 
block) of 7; with variance (k/(r — d))(o° + ko). 

The scalar S’/o’ is distributed as a central chi-square variate with 

n—-b-—-t+l=f 

degrees of freedom, so S’/f is an unbiased estimate of o’. 

The scalar S**/(o’ + ko) is distributed as a central chi-square variate with 
b — t degrees of freedom if b > t, so S**/(b — t) is an unbiased estimate of 
o + kos. If b = t, then S* is not defined. 

The scalar y --- is distributed normally with mean y» and variance 


(o” + kog)/bk. 


‘ Be 2 cg? 
The statistics uw, we,--**, Uer, M1, t2,°*', tu, S, S*, y---, can also 


be shown to be mutually independent. 


6. The analysis of variance. To obtain estimates of the parameters from the 
data, we may compute the analysis of variance table as in Table 1. 

In order to see that Yates’ definition of the combined estimate of 7 is based 
on the minimal sufficient statistics, we can verify that 

A; = ((r — A)/k)X'X, A, = S*, A; = (At/k)U'U, A, = S 


Ag = C, ; As = C2 ; Cc* = Ci + C2 ’ Sa + C3 => A + As. 


From these can be deduced the relation C; = A; + As — C3 or 


Ci = (At(r — d)/rk’)(U — X)'(U — X). 





“sosA[VUB 440g UI BUIES +> :9}0N 


wbs 


[F301 SSL SSL = ( ---4 -“"*) | ale [890], 


10119 YIO[G-B1IZUT Qe ‘ty —-.V - SSl “10119 YOO]G-Vs}UT 
PN 


(syoo]q Buo0ust) “s4z17 0g¢- ***(syoo|q Zuryeurunya) syueulyvesy, 
4 


Jopureulsdy 'y —.V Jopurewey 


£ — 4)! 
eC — ‘D) x x 4 ‘*-gueuoduioa yueulyvely, 


I 
sq — *g) Z | (s}U9UIZBeL} ZuLOUBI) syoo[g 


juouodulo0o JUOUIVeLT, 


('8}43 Zuiyeurwtye) syoolg 


a01n0g — oy $s anos 





S 
= 
Zz 
_ 
Zz 
° 
_ 
& 
< 
= 
3 
oa 
v4 
— 
o 
Z 
Zz 
— 
= 
= 
° 
oO 


nun A fo ssfjuy 
1 ATaVL 





804 FRANKLIN A. GRAYBILL AND DAVID L. WEEKS 


7. Combining the estimators. If o? and oj are known, then the linear com- 
bination of u; and z; which gives the minimum variance unbiased estimate of 
7; is givenjby Pn 
u; var (2;) + 2; var (u;) 
var (x;) + var (u;) 
or 
(5) usdt(o” + kos) + ai(r — rA)o’ 
At(o? + kos) + (r — Ao? 


Yates uses now C* and C, from Table I to estimate o” and o° + kos. That 
is, he uses 





¢ = Ai/f = Ci/f 
s-| ag ~ oe) 
Lb - 1) SIGE 


Letting ¢{ be the estimate of o° + kos by Yates’ procedure we have 
2 C, , kb—1)}] C* Cy 
ar 7F yo a] 


sn [“— 
“i —-)l re 





T , Pe *2 (k — t) v2 
(U-X)(U-X)+8 |+#=4s 


Therefore, using the form of (5) and inserting the estimates for the unknown 
variances we have for the estimate of 7; 


Ar — dA) - 7, “TT k ¥ 42 a= s. 12 
ux At [38 (U — X)'(U X) + gan" + ig a | 








re (r — X) ¢ 
mh. Xr — d) k (k - 
, = , HTT — ¥ Y 42 aid 2 
uReS Dy NO - Dt gins teas 
+ (r > d) s 


Actually, Yates defined the estimate of 7; to be the quantity given in (6) if 
65 > 0 and defined it to be equal to (1/r)V, if #3 < 0. Clearly Yates’ estimate 
so defined is based on the set of minimal sufficient statistics. 


8. An alternative combined estimate. If we let the estimate of r; be the 
quantity defined in (6) for all values of 63, we will show that this is an un- 
biased estimate of 7;. Rewriting (6) we have 


uel + alr — re 
a o- Oe 
, Aloi + (r — Ajo? uy + il 7) 


where 


S dé; 
~ tO? + (r — rN 


v and —x <y<l. 





COMBINING INFORMATION IN BIB 805 


7; can now be written as z; + (u; — 2;)y and since E(z;) = 7;, 7; is un- 
biased if E(u; — 2;)y = 0. Letting 2; = (u; — 2,), 7; is a function of t + 1 
mutually independent random variables. Denote the density of these ¢ + 1 
random variables by h. Since the z; are normal with means equal to zero, h is 
an even function of each z;. y is also an even function of z; and since 


—xn <4< om, 


it follows that E(zz) = 0. Therefore, 7; is unbiased. 

The minimal set of sufficient statistics set forth in this paper are not com- 
plete since E(u; — 2z;) = 0, ie., there is a non-trivial unbiased estimate of 
zero. 

Therefore, the problem of which estimate of 7; is “best”? is not straight- 
forward. This will be dealt with in another paper. 


REFERENCES 


{1] Cuurcuttt Ersennart, ‘““The assumptions underlying the analysis of variance,” 
Biometrics, Vol. 3 (1947), pp. 1-21. 

[2] D. A. 8S. Fraser, “A note on combining of inter-block and intra-block estimates,’’ 
Ann. Math. Stat., Vol. 28, (1957), pp. 814-816. 

[3] O. Kemprnorne, The Design and Analysis of Experiments, John Wiley and Sons, New 
York, 1952. 

[4] E. L. Leamann anv H. Scuerre, ‘Completeness, similar regions, and unbiased esti- 
mation—Part I,” Sankhya, Vol. 10, (1950), pp. 305-340. 

[5] F. Yates, ‘‘The recovery of inter-block information in balanced incomplete block 
designs,’’ Ann. Eugenics, Vol. 10, (1940), pp. 317-325. 





ASYMPTOTICALLY EFFICIENT TESTS BASED ON THE SUMS OF 
OBSERVATIONS! 


By Joun H. MacKay 
Georgia Institute of Technology 


Summary. For tests, @ = {¢}, of composite hypotheses, w; and w, , asymp- 
totic efficiency is defined in terms of the behavior as a — 0 of the sample size 
Ng required to reduce the maximum risk to a. For problems where the w; 
contain elements 6; whose relative densities satisfy 


sup inf Ee fo/fr)‘ = inf Es f2/fi)' = sup inf Eo(fe/fi)‘, 


Chernoff’s Theorem 1 [2] is applied to the non-randomized test o*, with ¢ = 1 
or 0 according as >> log (f2/f;) > 0 or not, and proves $* asymptotically effi- 
cient (Theorem 2.1). 

The principal results of the paper are applications of Theorem 2.1 to tests 
of the difference ( — 7) of binomial probabilities with samples of relative size 
m/n. For wo = {§ — n S — 4, w = {& — n = 4}, certain tests of the form 
¢: = 1 if and only if \(£ — 4) > (4 — 4), with d increasing in m/n, turn out 
to be asymptotically efficient, while all tests of the form y = 1, a, , 0 according 
as (£ — 4) is greater than, equal to, or less than c, are asymptotically inefficient 
when m ~ n. For given relative sampling costs, the ratio m/n may be chosen 
so that the asymptotic cost of observations is minimized. 


1. Introduction. Our results concerning asymptotic efficiency depend heavily 
on the work of Cramér and Chernoff ({1], [2]). In order to use these results in 
connection with the binomial problem mentioned above, we find a test of the 
composite hypothesis which depends on the sum of observations X, each of 
which is the likelihood ratio of distributions indexed by 6; ¢ w; . If Mo(t) is the 
moment generating function of X and ps = inf M,(t), we try to choose the 0; so 
that p» attains its maximum in w; at the point 6;. We then employ a Bayes risk 
to establish a lower bound for the minimum sample size required to reduce the 
maximum risk to a, and use Chernoff’s Theorem 1 to show that the corresponding 
sample size for our test is asymptotically equal to this lower bound. 

Let 6 ¢ 2 be a 1-1 index on a class of distributions on a probability space with 
elements Y and let w; , w. be disjoint subsets of 29. Let Y = (Yi, Y2,---) bea 
sequence of independent random elements with a common distribution indexed 
by 6 €Q. A test (sequence of tests) @ = {¢,} with ¢ depending only on Y,,--- , 
Y, will be described by the probabilities ¢,(Y), assigned to the decision ‘“‘@ € w. .” 
The loss of the decision “‘@ ¢ w,’”’ is denoted by w;(@) and it is assumed that 


Received May 21, 1957; revised February 14, 1959. 

! Most of the results in this paper were obtained in research work sponsored by the United 
States Air Force through the Air Force Office of Scientific Research of the Air Research 
and Development Command under Contract Number AF 18(600)-458, and were included 
in the author’s PhD thesis at the University of North Carolina. 


806 





ASYMPTOTICALLY E#FICIENT TESTS 807 


(1.1) OS w,(0)SB< ~, 0 < w,(@), ifandonly if@ew;, j #i = 1,2. 
The k-observation expected loss for the test } is designated r,(6, ), 
re(0, >) = we(0)E od, + wi(O)Eo(1 — dy). 
Definitions 1.1. For any test @, any distribution P on Q, and any a > 0, 
re(P, >) = Eers(0,o), = (>) = sup re(9,), 1% = int re(), 


and, with LJ abbreviating “‘the least integer k such that,” 
Np = LI [inf n(P,) S al, N = LI{n & al, Nz = Li[ri(@) S al. 
@ 


We note that inf,’ m.(P,o’) S re S re(¢) for all k, P and } and hence 


(1.2) NrpSNSEN, for all a, P, &. 


Thus, for each a, N/N, defines an index of efficiency of the test , and 6 will 
be called asymptotically efficient as a — 0 if and only if 


(1.3) Nw~WN,g asa — 0. 


2. Asymptotic efficiency of tests based on sums. Let X, , X2 --- denote the 
values of a real function at Y,, Y2 --- respectively. 


Definitions 2.1. 
M(t) = Eve”, —2 <t< oo, py» = inf M,(t), 
t 


Se=Xit--- +X, of (¥) -| 


lif S, > 0 
0 otherwise| 


We shall use Chernoff’s Theorem 1, a variant of his remark (3.11) and part of 
the general version of his lemma 8 [2]. 
THEOREM (Chernoff). If —«» s EX < Oand0O < ¢€ S p, where p = inf M(t), 


(p — e)* = ofPr{S, = 0}], 
(T) . 
Pr{S, = 0} s p*. 


(R) Remark (Chernoff). If M(t) < 1 for some t > 0, EX <0. 
(A direct proof consists in first noting that the existence of EX is implied, and 
then that M is non-decreasing on t 2 0 if EX = 0.) 

Lemma (Chernoff). If f, and fz are probability densities with respect to u of distinct 
distributions and X(Y) = log fo(Y) — log fi( Y), then 


M,(t) = M(t + 1) 0<t<1, 


(L) the p; are attained for t; with 2 = th — 1 <0 < t, and 


A=p= inf [x , an <i. 


0<t<l1 





808 JOHN H. MacKAY 


The following theorem is implicit in [2] for the case where the w; are simple. 
THEOREM 2.1. If 


(a) X(Y) = logfo(Y) — log fi( Y) with fe/f, the likelihood ratio of distributions 
indexed by 02 € we and 0, € w,, 


(b) pe S pi on w; (t = 1, 2), 

(c) EeX < 0 onw,, EeX > 0 on w:, then 

(i) the test 6* is asymptotically efficient as a — 0 and, 

(ii) except in the trivial case where p, = 0 and N = N,- = 1, 


Proor. By (c) it follows from (L) that p, = p, < 1. Thus if p, = 0, pp = 0 
on w, U w and it follows from (c) and the definition of p» that the distributions 
of w, concentrate on X < 0 while those of w, concentrate on X > 0 (Lemma 1 
of [2]). 

By (a) ¢ is Bayes with respect to P* concentrating on 6, and 6, and assigning 
to 6; probabilities proportional to w,(6;), 7 # i = 1, 2. Letting 


w* = w;(62)we(A:)/[wilO2) + we(4:)], 
re(P*, o*) = w*[E ior + Ex(1 — ¢r)], 
(T) applied to X at 6,, —X at 6 yields for any 0 < « S p:: 





(2.1) r.(P*, o*) = 2w*[(o — e)*] for all k = k(e). 
Hence a 2 ryp+*(P*, 6*) = 2w*(p — e)*”* for all a < 2w*(p, — «)*” and 
therefore 

_ * 
(2.2) Ny 2 Mago — G30" tor all a < 2w*(p, — «)*. 


log(p: — €) 


To complete the proof in the case p, > 0 we obtain an upper bound for N4- 
through one for r.(o*). Using (1.1), (T) for each @ and (b): 


r.(*) 


max [sup w2(0)Espr, sup w,(0)Eo(1 — ¢2)], 
(2.3) D ' 


lA 


=n k k 2 
@® max [p2 , pi] = Dp. 


From (2.3) with k = Ny — 1 








(2.4) No <1+ eo —* 
log px 


which, with (1.2) and (2.2), completes the proof. 





ASYMPTOTICALLY EFFICIENT TESTS 


0 
0 J ae “a Sis* & : we) ae m/n-3, 3-.75 


Fig. 1. Loci of 6; for various ratios m/n, and the sets w; for 6 = .125, .75. 


3. Applications to tests on the difference of binomial probabilities. 
EXAMPLE 1. (See Figure 1.) Let @ = (&, 7), @ = [0, 1] X [0, 1], 0 < 6 < 1, 
wo = {& — 9» S —d5}, w = {& — y | 4}, and Y = (U, V) with density 


fay) = (™) ea — (7) a8 = a) on (0,1 +++ ym} (0,1 + 
Then 
X(Y) =X(¥ | 6, 6) 


U log + (m — U) log 1—# + Viog ® + (n — V) log i —™, 
& 1-& m 1—m 


and 


M(t) = Mo(t| 61, 62) 


-[1Q)+0-0G=9TEQ)+0-G=9)T 


Since (with (1, 1) abbreviated to 1) w = 1 — wm, Mis(t|, &) = 


M.( — t|1 — 6,1 — 6), we will seek 6, , 0 in {@, + 6 = 1} satisfying (b) 
and (c) of Theorem 2.1 on w, . Here 


M(t) = Mo(t| 6) 
(3.1) = ((&' -— 1)'+ (1 - &)(& -1)7 7" 
‘(n(ar — 1)'+ (1 — a) (mr - 1)" 





810 JOHN H. MacKAY 


is the moment generating function of 
[((2U — m) log (&' — 1) + (2V — n) log (mi — 1)] 


and [since this variable is degenerate if and only if 6; = (4, 4) z «| is strictly 
convex. The component powers are either identically one or have unique infima 
at 


(3.2) t 1 log (€* — 1) _ 1log (n* — 1) 


p= = 


2 log (&* — 1)’ * 2 log (nz? — 1) * 





In any event 


(3.3) L(@) = (4&1 — £)]"[4n(1 — »))"" s inf Mo(t| 6:) = ps(61). 


Since for 6 = 6, equality holds in (3.3), 6, can satisfy (b) only if 6, maximizes 
a first obtain an explicit characterization of the maximizer of L(@) on w . 
Let I(6) be the open interval of &, 
I(é) = (max (0, 4} — 4], min [$, 1 — 4}) 
(;-—6,4)ifos s 4| 
-{0 1 — 6) if > 3 
It follows from 
(L(g, 4) < L(4 — 4,4) if e 0,4 — 8), 
L(é, n) 4 L(é, & + 8) if & e 1(8), 
L(t, § + 6) < L(3, 3 + 4) if Fe (4,1 — 4), 
that the maximum can only occur in {6 | 7 = — + 6, — e J(6)}. Since 
gene) Fle - 7 3]+3leps- sea 
is decreasing with respect to — on (0, 1 — 6) and changes sign from positive to 
negative as £ traverses 7(5), L(@) has the unique maximizer 
(3.5) 6 = (&,m), m = & + 4, & the unique zero of (3.4) in J(6). 


Since by (3.3) oy qe | @,) and M¢,(4| 6:) = pi, (b) will be satisfied 
if 6, maximizes M,(4, @,). Because p,; < 1 the remark (R) will then show that 
(c) is also satisfied. 

To dispose of this maximization, note that & < 4 < m and hence 
that Mo(3} | 6,) can be maximal only on {7 = £ + 4}. For such 6 


(3.4) 


5 log M ce.e+sy(4 | 61) 
(3.6) 
ie (l-&)-& + (1—m) —m 


“eI —&) +0 — Oh 





"E+F0 —m + —Et— dm’ 





ASYMPTOTICALLY EFFICIENT TESTS 811 


which is decreasing with respect to — on (0, 1 — 4) and, by (3.5), vanishes at 
&,. Thus 6 is the unique maximizer of Mo(4 | 6) and (b), (c) of Theorem 2.1 
are satisfied. 

We summarize this application of Theorem 2.1 to Example 1 in terms of the 
maximum likelihood estimates (sample proportions), £ and 4. 

THeEoreEM 3.1. For testing {§ — » S —d} against {§ — » = 8} with bounded 
positive losses for wrong decisions, the nonrandomized test * with ¢¢ = 1 if and 
only if (£ — 4)m log (&' — 1) + (4 — 4) n log (mi — 1) > O, where & < 3 < 
m = & + 6 and &, is the unique root of (3.4) in I(8), is asymptotically efficient, 

To characterize the behavior of this test with respect to (m, n) variation, 
first note that by (3.5) m/n increases from 0 to © as & increases across /(6) 
and hence that 


(3.7) £, increases across J(6) as m/n increases from 0 to ~. 


To find the ratio m/n of maximum efficiency put m + n = 2M, m(1 — z) = 
n(1 + z), and minimize p,; = L(& , & + 6) by choice of z. By (3.4), & is a mono- 
tone increasing function of z and we have 


£ log (ki, & + 8) = aE (9 tog LAs, & p+ 8) | .) 


+ M fog (1 — &) — log(é + 8)(1 — & — 8)] 


Hs . flog &(1 — &) — log (& + 6)(1 — & — 8)], 


which increases from — to + as & crosses 7(6), and vanishes for & = (1 — 6)/2. 
Thus p; has a unique minimum for z = 0 and m/n = 1. 

If the relative costs of sampling are c and 1 — c(0 < ¢ < 1), the total sampling 
cost is Njem + (1 — c)n], which is asymptotically K(z) log a where 


M[l + 2(2¢ — 1)] 
log L(& , & + 6) © 


Thus asymptotically minimum cost occurs when z is chosen to maximize K(z). 
Using (3.8), 


e = M 46(1 — &) 
(3.9) dz (log tp 22 1) log L — > [1 + 2(2e — 1)] log ett — 6 
irs eiy log {[48:(1 — &))° "[4(& + 8)(1 — & — 8)]*}. 


K(z) = 


The log in (3.9) decreases as £ crosses J(5) and vanishes for & in [(6) satis- 
fying 





812 JOHN H. MacKAY 


Hence the asymptotic cost of sampling is minimized for & in J(6) satisfying 
(3.10). ¢ decreases monotonically from 1 to 0 as & crosses 7(é), and & decreases 
across J(é) as c increases from 0 to 1. Therefore the most economical ratio m/n 
decreases from © to 0 as c increases from 0 to 1. 

Representing the set of Y where ¢; = 1 in the form \(f — 4) > (4 — 4), 


= —mlog (&* —1)_ _R(&) 
— Meche @ 1) Raw” 


where 


_ log (1 — p) — logp 
BO) =~ 0 


From 1 — v < logy <v — 1 forv > 1, p < R(p) <1 —- p. Thus R(p) is 
positive and increasing from R(0+) = 0 to R(4 —) = $ and from this, (3.7), 
and (3.11), as m/n increases from 0 to ©, \ increases 


from 2R(4 — 4) to 1/[2R(4 — 6&)] if 6 < 3, 
from 0 to ~ if 6 = 3. 


If m = n(c = $), it is noteworthy that & = (1 — 6)/2,m = (1 + 6)/2, 
\ = land ¢@ = 1 if and only if £ > 4. If m ¥ n the following theorem shows 
that this test is asymptotically inefficient. 

THeoreM 3.2. If m # n and & = | is a test with % = 1, a, , 0 according 
as (£ — 4) is >cr, = Ce, < Ce, then & is asymptotically inefficient as a — 0. 

Proor. Let w = min{w2(6,), w:(62)] and abbreviate ¢°° to ¢*. It follows from 
the definition of r,(@) and the relations Z.(1 — y¥i*) = Ewit"™*, wi"! = 
vi, that r.(¢) = w max{Eyi*, EW’ ™} = wEwWr. Now ¥* is a test based on 
(2U — m)/m — (2V — n)/n which [c.f. (3.1), (3.2)] has moment generating 
function (at 6), Mo(t) = f&e"" + (1 — &)e "me" + (1 — moe”*)”. 
As in (3.2)—(3.3), po = inf Mo(t) = p, but equality would imply A = 1, hence 
is impossible by (3.12). As in the development of (2.2) it follows from (T) that 


log a — log w 
a 
Nee log (po — «) 


Thus, asymptotically, Ny> log a/log p, and N/Ny < No-/Ny X< log po/log om: 
< 1, which completes the proof. 

It should be added that for several binomial two population problems, in- 
cluding the one of this example, tests of the form y = 1 if and only if (£ — 4) > 
turn out to be asymptotically efficient as 5 — 0, [5]. 

EXxaMPLe 2. Consider the problem of Example 1 modified only by taking 
w, = {& — n» S 0} and specialized to the case m = n = 1. 

We will be content to show that (b) and (c) of Theorem 2.1 are satisfied 
for the choice, 


on 0 < p< }. 


(3.12) 


for all a < w(po — e)*. 





a= (4H, wa (Pt? to 8). 


_ 2 





ASYMPTOTICALLY EFFICIENT TESTS 
For this choice we have (by specialization in Example 1) 


x(Y) = (U-V) log 5 + log (1 ~ 8), 


Mo(t) = [&(1 + 8)‘ + (1 — &)(1 — 8)‘I[n(1 — 8)‘ + (1 — (1 + 8)‘. 


Since 


(Mee (t) S Mi(t) for @eu,,t > 0 
M(t) s | 


Me.s-»(t) S M(t) for 0 eu,t < 0\’ 


(b) and (ec) follow from (L) and (R), and the non-randomized test o*, with 
¢: = 1 if, and only if, 


—log (1 — &) 
log (1 + 6) — log (1 — 4) 


is asymptotically efficient. 


4. Acknowledgements. I am grateful to Professor Wassily Hoeffding who, as 
thesis advisor, proposed the problem and gave many helpful suggestions. I also 
owe much to the referee, who went to a great deal of trouble to point out ways of 
simplifying proofs and resolving various questions, and to Professor Herman 
Chernoff for his aid and encouragement. 


REFERENCES 


{1] H. Cramé&r, “‘Sus un nouveau théoréme-limite de la théorie des probabilités,”’ Actualités 
Scientifiques et Industrielles, No. 736, Paris, 1938. 

[2] H. Cuernorr, ‘‘A measure of asymptotic efficiency for tests of a hypothesis based on 
the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-507. 

[3] W. Horrrpina, ‘‘The large sample power of tests based on permutations of observa- 
tions,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 169-192. 

[4] W. Hoerrpine AND Joan R. Rosensuatt, “‘The efficiency of tests,’’ Ann. Math. Stat., 
Vol. 26 (1955), pp. 52-68. 

(5) J. H. MacKay, “On the efficiency of certain tests for 2 X 2 tables,’’ Ph.D. thesis, Uni- 
versity of North Carolina, 1956. 





LINEAR ESTIMATION IN CENSORED SAMPLES FROM MULTIVARIATE 
NORMAL POPULATIONS 


By G. A. WaTTrERSON 


Australian National University 


1. Summary. In this paper, the known methods of linear estimation are ex- 
tended to various cases of censored samples from multivariate normal popula- 
tions. The two estimators considered correspond to the minimum variance and 
the ‘alternative’ estimators treated by Gupta [3] and Sarhan and Greenberg [4], 
[5] for univariate samples. It is found that the ‘alternative’ estimator has im- 
portant applications in multivariate samples, being easy of computation and of 
low variance. 


2. Introduction. During recent years, the estimation of population parameters 
from censored samples has received considerable attention. Gupta [3] found 
both maximum likelihood and linear estimators for the mean and standard 
deviation of a univariate normal distribution using a sample from which a 
number of the largest observations had been censored. The maximum likelihood 
method of estimation was used in the more general case of censoring from a multi- 
variate distribution by Cohen [1], who laid emphasis on samples restricted to a 
fixed region of possible population values rather than on samples with a fixed 
number of observations missing; but his results also apply to the latter case 
after minor modifications (Watterson [8}). 

The advantages of maximum likelihood estimators are well known, the most 
important being the properties of asymptotic efficiency and unbiasedness. But 
for estimation from small censored samples neither the bias nor the exact vari- 
ance can be calculated for these estimators, and the actual computing of the 
estimates is considerable, involving iterative solution of the likelihood equations. 

In contrast, linear estimation has the following advantages. Firstly, for most 
parameters it is possible to obtain unbiased estimators and to calculate their 
variances, and secondly, linear estimates are easy to compute once the coeffi- 
cients have been found. Further, for certain special cases Gupta has shown that 
linear estimators are not substantially less efficient than those obtained by maxi- 
mum likelihood. 

These reasons have given a motivation for further study of linear estimators, 
and Gupta’s original methods have been generalised to doubly censored samples 
(having both large and small values missing) from univariate normal and ex- 
ponential distributions (Sarhan and Greenberg [4], [5], [6]). The theory is here 
further extended to linear estimation from multivariate populations with censor- 
ing effective on interior as well as extreme variates in the sample. 


Received June 27, 1958; revised December 18, 1958. 
814 





LINEAR ESTIMATION IN CENSORED SAMPLES 815 


3. Censored samples. Consider a k-variate normal population, the moments 
of the variates being determined by the parameters 
ai = E(z;), Capen Cov (x4, 5), i,j = 1,2,--:- ,k. 


From this population a sample of size n is drawn, containing in all n X k vari- 
ables. Suppose the sample is ordered with respect to one variate, say x; , so that 
it may be represented by 


ru < I KR °°s . S-Sums 
Ta) » ©2(2) » “ove Ta(n) 


Tel) » TK(2) » oe Lk(n) » 


where a bracket on the second subscript indicates the association of the variable 
with the corresponding ordered variable having no such bracket. Note that the 
associated variables are not necessarily in increasing order of magnitude. 

We define a censored sample as one having some or all observations missing 
from a number of the vectors of the ordered sample. For definiteness, censoring 
will be subdivided into three distinct types, each of which have practical applica- 
tions. 

Type A: Censoring effective on all variates of certain sample vectors, 

Type B: Censoring of associate variates only, 

Type C: Censoring of the ordered variate only. 

We shall not restrict the missing observations to belonging to the first or last 
vectors, but of course these cases are included in our formulation. Samples cen- 
sored from one end only are chosen to typify the three types of sample from a 
singly censored bivariate distribution. 

A. The 5 tallest trees out of a group of 20 are removed for milling. Their heights 
and mean diameters are measured so that an estimate of the volume of timber 
remaining in the group may be made. The two measured variates form a type A 
sample. 

B. The examination scores of 17 students are known, but only the best 12 
students are allowed to proceed to the next year of the course. The associated 
variate, the examination scores after one year’s study, is thus censored, but the 
ordered variate is completely known. 

C. Densities of several metal alloy specimens are measured, and each specimen 
is then subjected to fatigue testing. Supposing that the specimens are set in opera- 
tion simultaneously, the population parameters may be estimated sequentially 
as the specimens fail, but at each stage the sample available will be censored 
with respect to the ordered variate ‘time to failure’, whilst the associated ‘density’ 


is completely known. In fact, the first estimates may be made after only two 
failures. 


4. Linear estimation. For these types of samples, the estimation of all possible 
parameters may be inferred from the univariate and bivariate cases, which will 
therefore be treated in detail. 





816 G. A. WATTERSON 


(a) Univariate case. The estimation of the parameters yw; and +/o,, for the 
ordered variate has been carried out by Gupta [3] and Sarhan and Greenberg 


[4], [5] in the most usual cases of single and double censoring. For the ordered 
sample 


Inu <%2 <-°* < Lin 
define 


(1) uy = oi E (xn — m), Vim = o11 Cov (211, Zim) 


as being the means and covariances of ordered standard normal variables. The 
values of u; and v;,, are tabulated by Teichroew [7] and Sarhan and Greenberg 
[4] respectively, for samples of size n S 20. Suppose now that some of the ordered 
sample variables are missing; then a linear estimator will have the form Carr, 
where the summation extends over all values of / for which observations are avail- 
able. (This summation convention will be adhered to throughout the paper). 
The mean and variance of this estimator are, from (1), 


(2) E( Liewn) = man t+ Vou Lam, 
(3) Var (>camy) = on >, >a: andi. 


If we choose a; so that }va: = 1, )-aw: = 0, then )va:z,, is an unbiased 
estimator for 4; ; similarly, if rr = 0, Dann = 1, then dau is an unbiased 
estimator for +~/ a, . Obviously, these conditions do not determine the coefficients, 
and we may impose the further restriction that the variance (3) be made a 
minimum. Sarhan and Greenberg [4], [5] tabulate two sets of values a, and az; 
such that > ai: and > a2 are unbiased linear estimators for 4, and +/oy 
of minimum variance, in the particular cases of singly and doubly censored 
samples of size n S 15. 

For the more general case where observations may be missing from the interior 
of the sample, no tables have been constructed since the number of possible 
sample types is of order 2” for each n, and the task of computing and tabulating 
the coefficients even for reasonably small n would be excessive. Instead, an al- 
ternative estimator can be constructed with simple computational properties 
and which is not very less efficient than the optimal one. Gupta [3] suggested 
for single censoring that instead of minimising the variance (3), the coefficients 
obtained by minimising > \aj subject to the unbiasedness conditions gave an 
estimator of low variance. In our case, suppose there are p out of the n variates 
observable, and write @ = p> u;. Then with 


agi 9 lcs Lampton 3 
(4) Bu=p — (um — a)?’ ba = Se) 


we find that > Ait is an unbiased estimator of mw , > Bata is an unbiased es- 
timator of +/o,, and > Bu and >B2 are minimum subject to > Bu = |, 
dBi = Oand > Ber = 0, > Barr = 1. The relative efficiences of these ‘alterna- 





LINEAR ESTIMATION IN CENSORED SAMPLES 817 


tive’ estimators compared with the best linear estimate can be found by com- 
paring variances calculated according to (3). In the case of singly and doubly 
censored samples, Sarhan and Greenberg [4], [5] have tabulated these variances 
and efficiencies for samples up to size n = 15, and the worst relative efficiency so 
obtained was for single censoring with n = 15, when for the mean the efficiency 
was not lower than 84.66% and for the standard deviation not lower than 86.75% . 
Clearly, the extra efficiency of the optimal estimator is hardly worth the effort 
of computation, and this may be even more pronounced in the general case. 

(b) Bivariate case. For a bivariate normal population, there are five parameters 
requiring estimation, namely 4, uw, ~/ou Won and oy». The estimation of 
and +/o,, can be accomplished as in the univariate case, using all the x,, observa- 
tions available; for type A and C samples only p( <n) such observations can 
be used, whilst for type B samples where the ordered variate is not censored n 
observations are available. In the latter case, yu; is estimated by the arithmetic 
mean n~* ru , and +/o, can be linearly estimated by either a minimum vari- 
ance or an alternative estimator. The method of estimating the remaining param- 
eters uo, \/ ox , and oy depends on the type of sample considered. 

Type A or B sample. In a sample where the associated variates are censored, 
we cannot re-arrange them into increasing order of magnitude because the ranks 
of the missing observations are not known. 

From conditional expectations, or by direct evaluation of the moment gen- 
erating function of an ordered sample, it may be shown that 


E(2xn) = we + Ooi Uy ’ 
Var (221) = (1 — oie on Jom + oon Yn, 
Cov (221 5 Lam)) = 012011 Vim ’ l#m, 
Cov (211 , Le¢m)) = O12VIm- 
Therefore a linear combination of the available observations has the moments 
(6) E{ Saran} = wear + o201) arr ’ 
(7) Var { lawn} = oi20i D, > mYim + (1 — osc 022 )ow >a . 


If the a; are chosen to satisfy }-a; = 1, )-au; = 0, then the linear combination 
is an unbiased estimator for y2 , and alternatively, if }-a: = 0, Yan = 1, then 
the combination is an unbiased estimator for oi . This latter quantity becomes 
p12°/ om on the introduction of the correlation coefficient py = owoirow . Once 
more, additional restrictions may be made to determine the coefficients. The 
obvious criterion would be to minimise the variance in (7), which may be re- 
written 


(8) Var ( > at21 ) = om pi2 >, > 21 mVim +(1- pi2) >a}. 


However, the resulting coefficients would be functions of piz ; for example, when 
pi2 = 1, the variance in (8) reduces to that of (3), so that Dente and Deeitan 





818 G. A. WATTERSON 


are the best linear unbiased estimators for uz and pyr/ox2. By contrast, when 
ox = 0 the minimum variance estimate of ye will be simply the arithmetic mean 
of the observed values, and the standard deviation will be best estimated by 
treating the missing values as if at randon, ordering the remaining z, variables, 
and applying the univariate theory for a complete sample of size p. 

But in general the correlation will not be known and we must therefore relax 
the restriction of minimum variance, and instead seek estimators which have 
reasonably small variances for all possible values of the correlation. One such set 
of estimators is generated by the a; and a; considered before, because they are 
unbiased, and of minimum variance when pi: = 1. The variances of these possible 
estimators are 


Var { Deira} = on{ pi2>, D211 Yim + (1 — piz) >i, 

Var { Drasura:y} = ove{ pi2 >, De210¢2md im + (1 — pie) dai}. 
The alternative estimates based on the coefficients (4) will likewise be unbiased 
for u2 and py ~/on, and will have variances 

Var { oAizan} = oxi pi2>, > B1iBi mim + (1 —- pi) 2 Bid 

Var { > Berry} = o22{pi2z>, > B2BemYim + (1 — pi2) B31} 
Comparing (9) with (10), the relative efficiency of > Buta to Darran as 


an estimate of ye is 
E= pis > dau im Vim + (1 — piz) >a 
pis 72. But Bim Vim + (1 — piz) > Bit ; 
and a similar expression holds for the estimates of p2+/o2 . The minimum and 


maximum values of E are given when piz = 1 and py» = 0 respectively, and EZ then 
takes the values 


(9) 


(10) 








(il ) Enin = Do Dau aim Vim . Emax aa dain 
DDB Bim Vim > 8, 
For equal efficiency (E = 1), piz has the value 


: -1 
(12) pi2 = { 1+ 1 — Enin | Bmax DD Aim Yim | 


2 
Ewex —1 Ewin Doeiu 


Of course, Eynin is also the relative efficiency of > >8:21. compared with Svar 
as an estimate of 4, , and as such has been tabulated for doubly censored samples 
by Sarhan and Greenberg [4], [5] for values of n up to 15. Using their tabulations 
and also some further calculations, we have found the values of Emin, Emax and 
| pe | for equal efficiency in the case of doubly censored samples of size n = 10, 
where 7; observations are missing from the left of the sample, rz from the right, 
leaving p = nm — ™ — fe central values of x2 observable. Table 1 shows these 





LINEAR ESTIMATION IN CENSORED SAMPLES 819 


TABLE 1 


Relative efficiency of >-Bricx1 compared with > an102(n a8 estimates of us , showing 
minimum and maximum values, and the correlation required for equal efficiency. 
Doubly censored bivariate normal sample of type A or B, size n = 10 


2 





aa Oe ee 4 Cee 





| 100.00 | 99.43 | 98.06 | 93.54 | 01.50, 
Emax | 100.00 | 105.09 | 119.22 148.27 | 124.64 | 100.00 
| prs | my -9541 | 9655 | | .9744 er _ 
Buin | | 99.04} 98.29 96.29 | | 100.00 | 
Emex | 108.56 | 118.93 134.34 | 100.00 | 

| prs | .9559 | 9689 9823 We 
Enis 98.20 | 97.85 | | | 
Emax | 123.03 | | 124.13 | 
lose! | .9745 .9867 | 











Enia | 99.03 | 
Emax 115.24 | 
| prs | | | 9908 | 








Enin | | | 100.00 
100.00 | 





| 
max | | 
| 


| 


| | | 
| pie | i | | 


Note: The non-entries (—) in the above table correspond to the fact that EZ = 1 for all 
values of piz . 


quantities for estimates of yu, whilst Table 2 gives the similar quantities for esti- 
mates of p2+~/o2 . Clearly, unless | py | is very near unity, the alternative estima- 
tors are more efficient than the original, and in any case are never substantially 
less efficient. 

To investigate the absolute efficiencies of uw. against all linear alternatives, 
we see from (9) and (10) that the estimators Svayvn and >~B,rx» , will be 
least efficient when py». = 0, and in this case the best estimator is the arithmetic 
mean with variance p ‘ox. Thus the least efficiency that the original estimator 
Dd entan can have is p ( dai), and for the alternative estimator, > Buty 
the least efficiency is p'( >-8i:)~. In Table 3, these quantities are tabulated 
(as percentages) for the case n = 10 and all possible doubly censored samples. 

Clearly, neither estimator is satisfactory when pz = 0 unless the sample is 
almost complete (r; and r, small) or unless it is nearly symmetrically censored 
(r, = re). But without knowledge of py» no simple method of improving the 
estimators seems possible without going into the more complicated estimates 
deduced by maximum likelihood. 





TABLE 2 
Relative efficiency of > Barta) compared with > asitx1 as estimates of prr/on , 
showing minimum and maximum values, and the correlation required for equal 


efficiency. Doubly censored bivariate normal sample of type A or B, sizen = 10 
"2 
ry — S 
0 1 2 3 4 5 6 7 8 
| : 
Emin 99.87 96 .92 94.07 92.03 | 90.72 90.17 | 90.66 92.97 | 100.00 
O Emax | 100.39 | 112.41 | 127.78 | 139.47 | 145.40 | 144.69 | 136.81 | 121.53 00.00 
loz] | .9319| .9569 | .9672| .9724| .9752| .9767/| .9771| .9763/ — 
Emin 97 .08 96.11 95.64 95.80 96 .65 98.22 | 100.00 
7 = 113.45 | 119.22 | 122.02 | 120.27 | 114.02 | 104.78 | 100.00 
laws 4 9719 | .9790 | .9834| .9862| .9877| .9863/ — 
f 96.32 | 96.88| 98.02| 99.56 | 100.00 
S Bus 118.00 | 114.99 | 108.91 | 101.18 | 100.00 
| pis | 9843 | .9881| .9909| .9899| — 
Ein 96.16 | 99.96 | 100.00 
3 ie 103.11 | 100.11 | 100.00 
| P12 | .9544 .9913 — 
Banta 100.00 
So Dias 100.00 
| pre | -- 


Note: The non-entries (—) in the above table correspond to the fact that E = 1 for all 
values of pis . 


TABLE 3 


The Minimum Efficiencies of Original and Alternative Estimators of u2 Against 
all Linear Estimators. Doubly censored bivariate normal samples of type 


A or B, size n = 10, pe = 0 
rT? 
ri aes — eatin sin ‘a a " 
eA iL: ef 2 3 4 i 7 8 
0 | 100.00 90.64 68.99 47.38 31.34 | 20.57 13.39 8.35 4.28 
| 100.00 | 95.26 82.25 64.52 46.47 | 31.01 19.09 | 10.41 4.28 
1 92.11 78.55 56.41 36.08 21.32 11.26 4.16 
100.00 93.42 73.77 48.47 26.89 12.47 4.16 | 
2 81.28 68.83 44.93 22.30 6.87 
100.00 88.41 55.77 24.27 6.87 
3 78.54 62.93 20.50 
100.00 72.52 20.50 
+ 100.00 


100.00 





LINEAR ESTIMATION IN CENSORED SAMPLES 821 


It is interesting to consider the product of the estimators for +/o;, and owoir, 
namely 


> Bou: > BamT2(m) = >, > BaiBem@1102m) . 


From (5) we have 


Ei > > B2iBem21122(m)} 
= Lo DBarBem orvim + (wr + wrr/ou) (ue + Umoroir)}. 
But also > Bat = (0, > Bart: = 1, so that on summation (13) becomes 
E{ >) > BeBemtiite¢m)} = o12(1 + >. > BriBemdim)- 


We have thus found an explicit unbiased estimator for a , namely 


> Betir’ > Bamt2(m) “(1+ > > BaiBomd im BA 


but, of course, this is not strictly a linear estimator. The coefficients 82; may be 
replaced by the original ones az, . 

In summary, the above theory allows us to find unbiased estimators for , , 
we, Vou, o20it = par/on, and oy. It should be noted that the quantities 


> Doane ; > a2102mVim ; > B1tBim? im ; > BeBe’ im 


which occur in many of the equations are tabulated by Sarhan and Greenberg 
[4], [5] for some doubly censored samples of size n < 15, and this facilitates the 
calculation of the variances of the estimators. 

Type C samples. A bivariate type C sample will have p observed variables 2; , 
and n associated variables x, . There are two distinct cases possible here, because 
there are n — p 2-variables associated with missing z, values and it might or 
might not be known what rank these have in the ordered sample. An example 
of the latter case was given in §3, example C where ranks cannot be assigned to 
the specimens which have not as yet failed. By slightly changing example A of 
§3 we can illustrate the former case, for, supposing that as well as measuring 
the heights and mean diameters of the 5 tallest trees we also know the ordering 
(but not the exact value) of the heights of the remaining 15 trees and their as- 
sociated exact diameters, then clearly the positions of the associated variables 
in the ordered sample are known. The estimation of the parameters p2.~/o or 
o12 Will depend on which case is available. 

Consider first the estimation of yw, and +/o . Because all variables x21) are 
known, we may re-order them into increasing order of magnitude, say 


(13) 


Ta <%e < *** < Lon 


where we have now dropped the bracket from the second subscript to indicate 
strict ordering. Obviously the best estimator for yw. is the arithmetic mean 
n'>-2xo:, but there are two possibilities )vasra: and > S2r2 for estimating 
4/om , the former being of minimum variance, whilst the latter has easily calcu- 








822 G. A. WATTERSON 


lated coefficients. Note that we could not estimate +/on for either type A or B 
samples. 

Coming to the problem of estimating px~/on , if we do know the position of 
all the variables x2,;, in the ordered sample then we can proceed as before and 
estimate puv/on by Darran or > Botan, and estimate o by > Bemtim’ 
Dd Bertan: (1 + >> >°BomB2t0im)* or with az; instead of 82; . Here, the summation 
over m is for the p uncensored 2, values. As we can also estimate +/o for this 
type of sample, we can estimate py. directly by 


> eroitan{ > erortei} or > -Bertan{ > Berta}, 


but these estimators are biased. On the other hand, if not all positions for the 
x2 variables can be assigned in the ordered sample, we can proceed by disregard- 
ing those of doubtful rank and treat the sample as if it were of type A. Thus 
> aroun and > 'B2tan are unbiased estimators for pi2~/om , and 


>’a2 Xe 2am Zim and >'Be1 Lat > Bom Lim 
1 + DL Den 2m Vim 1 + D2 Ba Bom Vim 


are unbiased estimators for o,.. Here, the dash on the summation and the co- 
efficients indicates that variates of unknown rank are disregarded. Also, 


Y'a2tn{ > aero} and > 'B2txn{ > Berta} * 
will be (biased) estimates of py . 

(c) Multivariate case. For a multivariate sample of type A or B, the theory 
deduced for bivariate samples may be applied to each pair of variates z, , 2; , 
and most of the parameters may be estimated by linear estimators or their com- 
binations. In addition, for type C samples the bivariate theory may be applied 
to all variate pairs x; , x; ; when two associated variates are considered they form 
a complete bivariate sample and this may be ordered with respect to each variate 
in turn, thus providing estimates for all parameters of the population. 

It is clear from the efficiencies given in Tables 1 and 2 that for multivariate 
(as well as bivariate) populations, the alternative estimators based on the co- 
efficients 6,; , 82: defined in (4) are simply calculated, are generally more efficient 
than the original ones except for estimating the parameters ; and +/o,, , and 
have a high absolute efficiency (compared with maximum likelihood estimators) 
when not many sample elements are censored (see Table 3 and Sarhan and Green- 
berg [4], [5]). Therefore they can be recommended as a satisfactory solution to 
the problem of estimation from a multivariate normal censored sample. 


5. Example. We illustrate the above methods of estimation applied to a type C 
censored sample drawn from a bivariate normal population with parameters 


Mi = we = 0, ou = oz = l, mm = fn: > 0.6. 


A sample of size n = 10 was drawn from the tables of ‘Correlated Random 
Normal Deviates’ of Fieller, Lewis and Pearson [2], and when ordered with 
respect to 2; is 





LINEAR ESTIMATION IN CENSORED SAMPLES 


3 ee 6 7 | 8 9 10 


— 


Z 
| 
a | 
| 3 


2.13* 
1.01. 


re oq 
0.48 | 0.64 | 1.20" 
2.40 © | 0.66 | 3.08 


Tu 


0.39 
1.40 


| 
| 
fr 0. 
60| 0 -0. 


%2(1) 





The starred variables will be assumed missing for the purposes of the example, 
and thus a doubly censored sample of type C results. We assume that we know 
the ranking of the associated variables —0.16, 2.03, 1.01. The ordered values of 





| 


0.30 | 0.60 


5 6 clon 


9 | 10 


—0.16 


In T able 4y we show the original and alternative estimates “os the various param- 
eters, and in the case of strictly linear estimators their variances calculated ac- 
cording to (3) and (7). 


Lai 1.01 | 1.40 03 | 2.40. 


| 
B 


| 
| 


| | 
| | 0.65 


TABLE 4 


Parameter Estimates for a Type C, Double Censored Bivariate Sample 





Parameter 


Estimator 


0.1298 
0.6260 
0.9263 
1.2391 
6 0.7189 
6 QamL im 

1 + LLaricomim 
= 0.6297 | 


Laur 
T2I 

y 

ait 


a@2iZ21 


a@21L%2(1) 
a H21T2(1) 





Larire (4) 


Bair al 


= 0.5802 | 


0.6 


Original Estimates 


Variance 
(No. of Obs.) 


0.1085 


(7) | 


0.1000 (10) | 


0.1014 
0.0576 (10) 
0.1019 (10) 


(7) | 


Alternative Estimates 


Estimator 


0.1682 
0.6260 
0.9961 
1.2369 
0.7461 


ine 
n~ Dre 
Ler 
Bara 
B2i%e(1) 


| LBerx201) LBomZim 


1 i+ LD DB2i82mYim 


0.7019 
LBur21 


Zbaz 


| 
= 0.6032 | 





| 


Variance 
(No. of Obs.) 


0.1103 (7) 
0.1000 (10) 
0.1055 (7) 
0.0577 (10) 
0.1016 (10) 


As is expected, the original and alternative estimates are similar both in values 
and in variances, but the sample seems rather extreme with respect to deviations 
of x2 from its mean zero. 


Acknowledgement. I wish to thank Professor H. A. David for suggesting this 
problem and for much helpful discussion and criticism. I am also grateful to the 
referee for suggesting the generalization from doubly censored samples to 
samples with interior variables missing, and for other helpful remarks. 


REFERENCES 


{i] A. C. Conen, Jr. ‘Restriction and selection in multinormal] distributions, 
Stat., Vol. 28 (1957), pp. 731-741. 


”* Ann. Math. 





824 G. A. WATTERSON 


(2) E. C. Freier, T. Lewis, anp E. 8. Pearson, ‘‘Correlated random normal deviates,’’ 
Tracts for Computers, No. XX VI, Cambridge University Press, Cambridge, 1955. 

[3] A. K. Gupra, “Estimation of the mean and standard deviation of a normal population 
from a censored sample.’’ Biometrika, Vol. 39 (1952), pp. 260-273. 

[4] A. E. Sarwan AnD B. G. GreensBera, “Estimation of location and scale parameters by 
order statistics from singly and doubly censored samples Pt. I,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 427-451. 

. SARHAN AND B. G. GREENBERG, ‘‘Estimation of location and scale parameters by 
order statistics from singly and doubly censored samples Pt. II,’’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 79-105. 

6] A. E. SARHAN AND B. G. GREENBERG, ‘ 


es 


[5] A. E 


‘Tables for best estimates by order statistics of 
the parameters of single exponential distributions from singly and doubly cen- 
sored samples,’ J. Amer. Stat. Assn., Vol. 52 (1957), pp. 58-87. 

[7] D. TercnnoEew, ‘Tables of expected values of order statistics and products of order 
statistics for samples of size twenty and less from the normal distribution,” 
Ann. Math. Stat., Vol. 27 (1956), pp. 410-426. 

[8] G. A. Warrerson, ‘Ordered and censored samples from a multivariate normal popula- 
tion,’’ Master’s thesis, University of Melbourne, unpublished, 1958. 





NOTES 


THE EXPRESSION OF k-STATISTIC: k,, IN TERMS OF POWER SUMS 
AND SAMPLE MOMENTS 


By M. Zraup-D1n 
Panjab University, Lahore 


The values of ky and ky in terms of products of power sums s,’s (s, = >.a") 
have been published by the author ({7]). In this paper ky is expressed in terms 
of s,-products and sample moments, and is computed with the help of the tables 
of generalized k-statistics constructed by Abdel-Aty ({1]). From these tables 


3628800 19958400n fil {11} 


ku = ~_aD a+ no) [21°] + - n’ 


where n"” = n(n — 1)(n — 2) - - - (n — 10) and [1"J, [21°] ete., are the aug- 
mented symmetric functions, which can be expressed in terms of s,-products 
from the tables of symmetric functions given by David and Kendall [2]. Collect- 
ing all terms, ky is expressed in terms of s,-products. 

As a check, the sum of the coefficients of all s,’s is 1/n. 

ky is obtained in terms of sample moments m, by putting s,; = 0 ands, = nm, 
(r > 1). Thus 8s, = nmy , 8382 = nm;(nm,)* = n°’mim; , etc. 

The k-statistics have recently been applied in various fields by several writers 
such as Tukey [6], Hooke [4], Robson [5]. They are of interest to workers in the 
theory of sampling distributions and moment statistics. They are related also to 


certain aspects of the theory of numbers and combinatory analysis, as indicated 
by Dressel [3]. 


ky = [3628800s;' — 19958400ns28; 


ai 
+ 39916800(n? — n)s3s; — 34927200(n* — 3n” + 2n)sdsi 
+ 12474000(n‘ — 6n* + L1n’® — 6n) 83st 

1247400(n° — 10n‘ + 35n* — 50n’ + 24n)s)s; + 6652800(n* + 8n) 538; 

23284800(n* + 5n* — 6n) 89828¢ 

+ 24948000(n‘ + n> — 10n® + 8n)s38281 

— 8316000(n° — 4n‘ — n® + 16n” — 12n)s,s281 
+ 415800(n° — 10n> + 35n‘ — 50n* + 24n”) 8382 
+ 3326400(n‘ + 8n* + 25n” — 34n)s3s1 


Received April 11, 1958; revised June 27, 1958. 
825 





826 


M. ZIAUD-DIN 


5544000(n* + 2n‘ + 5n*® — 44n? + 36n)s3s08} 

+ 1663200(n° — 5n° + 15n* — 55n* + 104n®? — 60n)s3s35, 
+ 369600(n° + 25n* — 30n* — 116n* + 120n)s3s} 

— 92400(n’ — 9n° + 55n*® — 195n* + 304n* — 156n”)s}s0 
1663200(n* + 29n? + 42n)s4s1 


+ 4989600(n* + 22n*° — 17n? — 6n)sys281 
4158000(n° + 14n* — 67n*® + 88n? — 36n) sys3si 
+ 831600(n° + 5n® — 85n* + 295n* — 396n* + 180n) s4898; 
1386000(n° + 20n‘ + 65n* — 14n’ — 72n) 848981 
+ 1663200(n° + 10n® — 15n* — 40n* + 44n”) 54835287 
— 207900(n’ — n° — 25n® + 85n* — 96n* + 36n”) 548383 
— 138600(n’ + 3n° + 15n* + 25n* — 256n* + 212n’)s4835; 
+ 138600(n° + 25n* + 225n‘ — 385n* + 854n? — 720n)sis} 
— 103950(n’ + 11n° + 55n° — 655n* 


+ 2104n* — 2956n? + 1440n)sisos, 
+ 11550(n® — n’ + 67n*° — 355n° + 1084n* — 1804n* + 1008n”) sis, 
+ 332640(n* + 71n® + 396n” + 36n)sss{ 
— 831600(n° + 56n‘* + 101n* — 374n® + 216n)s5s28} 
+ 498960(n° + 40n° — 135n* — 70n* + 524n? — 360n) 558381 
41580(n’ + 23n° — 265n* + 925n* — 1296n* + 612n”*)s,82 
+ 221760(n° + 45n° + 145n* + 435n* — 1346n” + 720n) 558381 
166320(n’ + 27n° — 105n° + 385n‘ 


— 1576n*® + 2708n” — 1440n) 3583528; 
+9240(n® + 11n’ — 89n° + 785n* — 2936n* + 4244n* — 2016n") 5583 


— 41580(n’ + 39n° + 395n° — 115n* — 396n* + 1516n? — 1440n) sss8; 
+ 13860(n*° + 17n’ + 49n° — 805n*° + 2614n‘* — 3532n* + 1656n”) spsuse 
+ 2772(n*® + 38n’ + 652n° — 1510n* + 9199n' 

— 33088n* + 54948n” — 30240n)sis; 
— 55440(n° + 146n* + 1871n* + 2086n” — 1080n)s6s} 
+ 110880(n° + 115n° + 765n* — 2095n* + 854n? + 360m) ses2si 





k-sTatistic ky 827 


41580(n' + 83n° — 65n* — 1975n* + 6544n* — 7468n” + 2880n) sos3s; 
27720(n’ + 87n° + 395n° + 1085n‘ — 1836n* — 4052n” + 4320n) 568387 
9240(n* + 53n’ — 275n* + 1175n° — 4406n‘* + 7412n* — 3960n”) sesys0 
4620(n* + 62n’ + 484n° + 1970n° — 13001n‘ 
+ 41168n* — 60924n? + 30240n) sesys; 
462(n° + 38n*° + 652n’ — 1510n*® + 9199n° 
— 33088n‘ + 54948n° — 30240n”) sess 
7920(n° + 270n*® + 6295n‘ + 18810n* — 8816n? — 1440n)s;s! 
11880(n’ + 207n° + 2795n° — 5515n‘ 
— 7836n* — 20428n” — 10080n)s;s28; 
1980(n* + 143n’ + 355n*° — 8275n° 
+ 28444n‘ — 37228n* + 16560n’)s,s3 
2640(n* + 146n’ + 976n° + 110n*® + 11899n* 
— 60736n* + 77844n* — 30240n) 7338; 
330(n? + 86n* + 316n’ + 5450n° — 35201n” 
+ 115424n* — 176796n* + 90720n’) s,s, 
990(n’ + 459n° + 16795n° + 91785n‘ 
— 11756n* — 67044n’ + 30240n) sss} 
990(n* + 332n’ + 7054n° — 8380n° — 63851n‘ 
+ 159248n* — 124644n* + 30240n) 55525; 
165(n* + 206n* + 1636n’ — 5230n*° + 58999n* 
— 236896n* + 332484n* — 151200n”) s,s, 
110(n*® + 713n’ + 36277n° + 292115n*° + 92434n‘ 
— 519628n* + 340008n* — 60480n) s9s} 
55(n® + 458n*® + 12472n’ — 11530n*° — 186701n* 
+ 555392n* — 581772n* + 211680n”) sos. 
11(n’ + 968n* + 60082n’ + 595760n*° + 371569n° 
— 1594648n‘ + 1261788n* — 332640n”) s,s; 
(n” + 968n° + 60082n* + 595760n’ + 371569n° 
—1594648n° + 1261788n* — 332640n*)s,)}. 
REFERENCES 


{1] ApBe.-Ary, ‘‘Tables of generalised k-Statistics,’’ Biometrika, Vol. 41 (1954), pp. 253-260. 
[2] F. N. Davip, anp M. G. Kenpa tt, “‘Tables of symmetric functions,’’ Biometrika, Vol. 
36 (1949), pp. 431-441. 





828 HOWARD G. TUCKER 


[3] P. L. Dresset, ‘Statistical semi-invariants and their estimates with particular empha- 


sis on their relation to algebraic invariants,’’ Ann. Math. Stat., Vol. 11 (1940), 
pp. 33-57. 


[4] R. Hooks, ‘‘Some applications of bipolykays to the estimation of variance components 
and their moments,’’ Ann. Math. Stat., 27 (1956), pp. 80-98. 

[5] D. 8S. Rosson, ‘‘Application of multivariate polykays to the theory of unbiased ratio- 
type estimation,’’ J. Amer. Stat. Assn., Vol. 52 (1957), pp. 511-522. 

[6] J. W. Tuxey, ‘Variances of variance components, I. balanced designs,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 722-736. 


[7] M. Z1aup-D1n, ‘‘Expression of the k-statistics ky and ki in terms of power sums and 
sample moments,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 800-803. 


rr 


A GENERALIZATION OF THE GLIVENKO-CANTELLI THEOREM 
By Howarp G. TuckER 
University of California, Riverside 


A theorem referred to as the Glivenko theorem or the Glivenko-Cantelli 
theorem states that if X, , X.,--- , X,, --+ isa sequence of independent, identi- 
cally distributed random variables with any common distribution function F(z), 
then the sequence {F,(x)} of empirical distribution functions converges uni- 
formly to F(x) with probability one. (See Loéve [3] and Gnedenko [2].) The 
assumption of independence is not necessary for this theorem, and it is readily 
observed that the same conclusion holds if the sequence of random variables is 
a strictly stationary, ergodic (or metrically transitive) sequence. The purpose 
of this note is to prove a generalization of this theorem in the case where the 
sequence of random variables is strictly stationary, not necessarily ergodic, and 
with the same assumption that the common distribution function is arbitrary. 

It is assumed that the reader is familiar with strictly stationary stochastic 
processes (with discrete time) and is acquainted with the notion of measure- 
preserving set transformation determined by the process and the notion of ran- 
dom variable transformation determined by this set transformation. Information 
on these concepts is available in Doob [i] and Loéve [3]. The principal result 
to be used in the proof of the theorem is the ergodic theorem for random variables 
(see Loéve [3], p. 434), which can be stated as follows: 


Let S be a measure-preserving set transformation over the probability 
space (Q, @, P), let T be the random variable transformation deter- 
mined by S, and let 3 be the invariant sub-sigma-field of @ determined 
by S. If X is any random variable for which E| X | < ~, then 


Pin'(X + TX + --- + T''X) > E(X|5)} = 1. 


By means of the ergodic theorem in this form the following theorem is obtained. 


Received August 25, 1958; revised February 25, 1959. 





GLIVENKO-CANTELLI THEOREM 829 


Tueorem: If {X,} is any strictly stationary sequence of random variables, if 3 
ts the invariant sigma-field of events determined by it, and if {F,,(x)} denotes the 
associated sequence of empirical distribution functions, then 

P sup | F,(r) — F(x|3) | = of =1, 
—a<zt<+o0 
where F(x | 3) denotes the conditional distribution function of X, given 5. 

Proor: All equalities and inequalities between random variables, and all 
limits of sequences of random variables in the proof that follows are to be under- 
stood to hold with probability one. Also, equality between events means that 
their symmetric difference is an event of probability zero. Let 7 and k be two 


arbitrary, fixed integers for which 0 < 7 < k. We define a random variable X 
by 


(1) X j. = inf {s | s is rational, F(s|3) = j/k}. 


In order to verify the fact that X , is indeed a random variable one need only 
observe that [Xx < z] = f {[F(s|3) = j/k]|s = 2, s is rational} for every 
real number zx. By this definition of X », , it is measurable with respect to 3, and, 
consequently, if we denote by 7 the measure preserving set transformation 
determined by {X,} as well as the transformation of a random variable which 
is measurable with respect to the sigma field determined by {X,}, we have 
TX x = Xj, and T[X eB) = [XB] for every linear Borel set B. Formula (1) 
easily implies that 


(2) F(X — 0|5) S j/k S F(Xu|3) 


and that there is no 3-measurable random variable smaller than X ,, with posi- 
tive probability for which inequality (2) is true. Since 


F,(x) = nd Tixj<a; 


it follows that 
F,(Xa) =n" Do Tinsxm- 
t=1 
It is now shown that the sequence of random variables 


{Tixysxy),t = 1,2, --+} 


is strictly stationary. Indeed, by the properties of 7’, if {r,} denotes the set of 
all rational numbers, then 


T[X; > Xu] = T(U,[Xs > ral[Xix < ral) 


= U,,[Xix > ral[Xi < ra] = (Xin > Xyul. 
Thus T[X; S Xp) = [Xiern S Xa). 


By the ergodic theorem stated above, we get 
PIF A(X wn) wr P{[X1 S Xx) |5}] = 1. 





830 HOWARD G. TUCKER 


Let x be any real number, and let 
A, = {x < Xul, 


A; = [Xjnn St < Xx], 2sjsk-1, 
and 
Ay = [Xie S 2]. 
(It should be noted that any of the events Az, --- , Ay_1 can be empty sets. ) 


We further use the notational conventions F(X. |3) = 0 and F(Xu,|3) = 1. 
Then, for fixed k and fixed x, we may write 


- k 
(3) DP (Xjaa|3)la; S P(x|3) S VP(Xjn — 0(5)Li,, 
j=l j=l 
and 
k n 
(4) DF (Xj |3)La, S Pa(x) S DOF AXw — O)La,. 
j=l j=l 


From inequality (2) we obtain 
(5) F(X — 0|3) — F(Xsa%|5) S 1/k. 
Inequalities (3), (4), and (5) yield 


k 
F,(x) — F(x|3) S DO(Pa(Xu — 0) — F(Xjax|3))La, 


j=l 


k 
Xo (Fa(X x) — F(X jx |3))La; 
; i= 
(6) k 
+ LF (Xin 13) — F( Xj |5))La; 
7 
Ss max|F,(Xx%) — F(Xj 135) | + 1/k. 
1Sisk 
In precisely the same manner we arrive at 


(7) F(a|3) — Fa(x) = —1/k — max | Fx(Xu) — F(X |3) |. 
lsjisk 


Combining inequalities (6) and (7) we obtain 
(8) | F(z) — F(x|3)| S 1/k + max | F,(Xx%) — F(X |5) |. 
1sjsk 


Since the right hand side of (8) does not depend on z, (8) will continue to hold 
if we take the supremum of the left hand side over all real xz. If we then take 
lim sup of both sides as n — «© and make use of the fact that the integer k may 
be arbitrarily large, we obtain the conclusion of the theorem. 


REFERENCES 


[1] J. L. Doon, Stochastic Processes, John Wiley and Sons, New York, 1953. 

{2] B. V. GNepENKo, Kurs Teorii Veroyatnostyey, Gosudarstvennoye Izdatyel’stvo Tech- 
nico-Teoreticheskoi Literaturi, Moscow, 1950. 

[3] M. Lo&ve, Probability Theory, D. van Nostrand, New York, 1955. 





SERIAL CORRELATION COEFFICIENT 831 


THE LIMITING DISTRIBUTION OF THE SERIAL CORRELATION 
COEFFICIENT IN THE EXPLOSIVE CASE II 


By Jonn 8. Wuarre 
Aero Division, Minneapolis Honeywell Regulator Company 
Introduction and summary.’ A standard linear regression model is 
(1) Le = AY + Uy (¢ = 1, 2,3,--- T) 


where a is an unknown parameter, the y’s are known parameters and the u’s are 
NID (0, o”). 
The maximum likelihood estimators for a and o” are 


“ > Le Ye 2 > (ay a éy,)” 
2 = (we ew et See 
” Sis T 


The statistic 
“ (& —_ a) 2\3 7 “—_ 1 i 
(3) ee (> yi) (75+) 


then has a ¢ distribution with T — 1 df. and its limiting distribution is N (0, 1). 
One approach to time-series analysis is to set y: = 2:1, y¥: = % = a constant. 
The model (1) is then transformed into the stochastic difference equation. 


(4) Xe = a%1 + Um. (t= 1,2,---,T) 
The maximum likelihood estimators for a and o’ in (4) are 


(5) ga ett op De — ata)’ 


= Ti-1 T 


which are exactly the values one would obtain by substituting y, = 2,; in (2). 
In this paper it is shown that the limiting distribution of 


(6) w= 42) pat 


which is the analogue of (3), has a limiting N(0, 1) distribution, except perhaps 
when |a| = 1. This result is well-known for |a| < 1 and was proved by Mann 
and Wald [1] under much more general conditions. The feature of the proof pre- 
sented here is that it also holds in the explosive case (| a| > 1). 


The limiting distribution. We define the quadratic forms 


1 2 . 1 2 
R = go? (> 232-4 =~ GB >. 4-1), Ss = gro? ae Li-1 
Received August 5, 1957; revised March 1, 1959. 
1 The author wishes to acknowledge several comments of the referee which simplified the 
presentation of these results. 





832 JOHN 8. WHITE 


4 
(=) for ja| <1, 
al 
\T 


for |a|>1. 


where 


= g(T, a) 


| 
a —1 


It has been shown [2] that the limit of the joint characteristic function of R 
and §S is 





(u,v) = exp (i - *) for |a| <1, 
— 1)(2iv — 
= exp {ae mae = > a + vu? — 2iv)! for |a|> 1. 


Let r and s be random variables with joint characteristic function ¢(u, v) 
Then the limiting distribution of 


(@— a) (22H) = 
) (= Fi 


is the same as the distribution of r/+/ . To obtain the distribution of r/+/s we 
must invert ¢(u, v). 

For |a| < 1 we see from the form of ¢(u, v) that r is N(O, 1) and 
Prob (s = 1) = 1. Therefore r/+/ is also N(0, 1). 

For | a| > 1, the joint distribution of r and s is not obvious. However, if we set 





p = }2i(a — 1), 
we may expand ¢(u, v) as 


¢(u,v) =e? > (pP/TYG +1) + wv — 2iv)**), 
j=0 


Inverting $(u, v) first with respect to v we have 


1 f ote. -p mo 8°! exp (—s{l + u’]/2)p’ 
— e u,v) dv =e ; : ‘ 
re 2, G+ NPG + 1) 





Inverting next with respect to u we have 


f(r, s) = ; | ~ (t+ [ie —**6(u, v) ) dn) du, 
27 J_« 2r Le 


rS (<= ) = (ps/2)? 
= ex a ee =f: a 6S? .|lk lel 
2+/ms , 2s 2 2, Mj + 3)rQj + 1) 
To obtain the distribution of r/+/s we make the change of variable w = r/+/ 
in f(r, s) and then integrate out s. We have ; 











SERIAL CORRELATION COEFFICIENT 


pba 0 oe £ Be bo oi Cgghllyl yess sss 
f(w, s) exp ( P Eset arora 


2 © j 


—wt; 
Sn 6 
V 26 


2 2 
—p 


fa) = [ fu, 0) ee 


—w?2/2 
e 


baa 
jo T(j + 1)’ 
Vin 
Thus r/+/s is again N(0, 1). 
To obtain the limiting distribution of W we note that 


Lo (a — ots)? _ Diu po 2 
T T 


> 


by the law of large numbers, and therefore 


6 


Dee = area)? De = aea)* _ (4 = @)* 52 
: T , wins 


Hence, the limiting distribution of 


(4 — a)(Diria)’ _ 


go 
is the same as that of 
(4 = a)(Dizia)’ Re 
o V/s’ 
and hence W is N(0, 1). 
Applications. For large samples W approximately N (0, 1) and hence may be 


used to construct confidence intervals for a. For example, a symmetric 95% 
confidence interval for a would be 


é é 
a — 1.96 —.+— Sa a+ 196 —~.— 
(> z_,)* (D> 23-1)" 


The likelihood ratio criterion for testing the hypothesis H: a = a» against the 
alternative hypothesis H: a ¥ ag is 


eS [ _ (&@ — a)" Dozi-n y 
> (1 _ a X41)" 


and asymptotically 


— Ww T/2 
.=(1-¥) 


—2log\ = W’. 





834 JOHN 8S. WHITE 


Hence, the limiting distribution of —2 log \ is a chi-squared distribution with 
1 df. 


For testing this hypothesis a large sample critical region which might be used is 
| W | > 21—p/2 
where p is the level of significance and 2;_»/2 is the 100 (1 — p/2) percentile 
point of the normal distribution. 
It should be noted that the above results probably do not hold for |a| = 1 
REFERENCES 
[1] H. B. Mann anp A. Wa tp, ‘‘On the statistical treatment of linear stochastic difference 
equations,”’ Econometrica, Vol. 11 (1943), pp. 173-200. 


(2) J.S. Ware, ‘The limiting distribution of the serial correlation coefficient in the explo- 
sive case I,”” Ann. Math. Stat., Vol. 29 (1958), pp. 1188-1197. 





ABSTRACTS OF PAPERS 


(Additional abstract of a paper presented at the Pittsburgh meeting of the Institute, 
March 19-21, 1959.) 


33. The Comparison of the Sensitivities of Similar Experiments: Model II of 
the Analysis of Variance. D. E. W. ScouMANN AnD R. A. Brap.ey, Uni- 
versity of Stellenbosch, South Africa, and Virginia Polytechnic Institute. 


When alternative scales of measurement or experimental techniques are available for 
experimentation, it is desirable to use that scale or technique more sensitive to the exhibi- 
tion of treatment differences in the experiment. Sometimes it will be desirable to do pre- 
liminary experimentation in order to choose the more sensitive technique. 

We consider parallel experiments with samples from the same experimental treatments 
and similar experimental designs. The experiments must be conducted so as to be inde- 
pendent in probability and to be appropriate for use of analysis of variance. 

In earlier work we have considered Model I of the analysis of variance [Biometrics 
13 (4), 1957] and the comparison was based essentially on the distribution of the ratio of the 
F-statistics from the two experiments, each F assumed to have the non-central variance- 
ratio distribution. The distribution of the ratio was approximated by the distribution of 
the ratio of two central variance ratios with appropriately adjusted degrees of freedom. 

We consider Model II of the analysis of variance here and the more sensitive procedure 
is the one with the relatively larger component of variance for treatments. Now the ratio 
of two central variance ratios is directly required. Additional tables have been provided 
for use with either Model I or Model II and an example is included. A confidence interval 
is also provided. 


(Abstracts of papers not presented at any meeting of the Institute.) 


1. Simultaneous Comparison of Tests. R. R. Banapur, Indian Statistical 
Institute. (By title) 


Let @ be a parameter space of points @ and let z be a sample point with distribution 
P, . Let Q be a subset of 2. A sequence 7 (Z), T(Z), --- , of real valued statistics is 
then said to be standard (for testing 9) if (I) lima. Ps(7T‘ = zr) = L(x) (say) for every 
x and every @ € Q , (II) log L(z) = —az*{l + O(1)] ast — «©, where0 < a< ~, and (III) 
for each @ in Q — Q , 7 /nt > b(6) with probsility one as n> ©, where0 < b < o. It 
is pointed out that if {7}"} and {73"} are standard sequences then, for any given @¢ 2 —Q%, 
¢ = a;b?/a2b3 serves as the asymptotic efficiency of {T{”} relative to {T3”} , with attain- 
ment of an assigned significance level as the criterion. In particular, 


L,(T{™)/La(T'}) — (0) 


with probability one as n — © if gy < 1(>1). Again, with Nj = inf{m: L;(T™) s 1} and 
Nt = inf{m:L,;(T{”’) <S l for all n = m}, (i = 1, 2), both ratios N3/N{ and N4/Ni - ¢ 
with probability one as 1 — 0. The paper discusses, as examples, the simultaneous compari- 
sons of the sign and ¢ tests of a mean; of the Wald-Wolfowitz test, the Smirnov test, and 
the ¢ test for two samples; and of the Kruskal-Wallis test and the F test for k samples. 
The first mentioned example is also studied, under normality assumptions and using the 
exact levels L{" (7?) rather than the asymptotic levels L;(T{”), in another paper (Ann. 
Math. Stat., 1959, p. 623, (abstract)). 


835 





836 ABSTRACTS 


2. Mill’s Ratio and Linear Truncation for Some Pearson Curves. WILLIAM 
FELLING AND WaLpo A. Vezeau, St. Louis University. (By title) 


Any statistical distribution is completely determined when the parameters of the distri- 
bution are known. The determination of these parameters, when some variates are de- 
liberately excluded from the sample population, presents an interesting problem in sta- 
tistical point estimation. The author has developed estimate equations of the parent 
population parameters when samples are assumed taken from a Beta distribution, and 
the sample has been truncated on either or both tails of the distribution. The estimate 
equations are developed using methods of maximum likelihood. Since these equations in- 
volve Mill’s Ratio of truncated area to bounding ordinates, the use of continued fractions 
is employed to obtain bounds on this ratio. These bounds are obtained from approximants 
to the continued fractions and the use of successively higher ordered approximants increases 
the accuracy of the bounds as estimates of Mill’s Ratio. 


3. On Cumulative Function Theory. JoserpH M. Moser anp Wa.po A. VEZEAU, 
St. Louis University. (By title) 


New cumulative functions are developed in this paper by using the Bernoulli differential 
equation, dy/dz + Py = Qy", where y is the cumulative frequency function F (x). This is 
a generalization of Burr’s differential equation. 

The nth cumulative moments for some of these newly developed cumulative frequency 
functions and for cumulative frequency functions developed by other authors are discussed. 

It is shown that when the method of curve fitting by using moments fails, other methods 
can be used, such as, interpolation, ratio M and others. Ratio M is developed because the 
function [z~* + 1]~ could not be handled in curve fitting by moments. Ratio M is the ratio 
of the abscissa of the mode to the abscissa of the median. By means of a chart of values 
of M the values of the parameters are chosen. 

For purposes of simplification of theory, the Stieltjes integral is introduced to define 
cumulative moments. 

Finally, a discussion of reliability functions is presented. A reliability function is defined 
to be R(x) = 1 — F(x). The moments defined for R(x) are very similar to those for cumu- 
lative functions and can be expressed in terms of cumulative moments of F(z). 


4. On Sampling Distributions Derived by Cumulative Characteristic Function 
Methods (Preliminary Report). Jose R. Papro anp WaLpo A. VEZEAU, 
St. Louis University. (By title) 
New relations and properties of cumulative characteristic functions are presented 
Derivations of sampling distributions for particular cumulative functions are given. 


5. An Extension of the Theory of Cumulative Frequency Functions to N Vari- 
ables. Sk. Mary ALBERTA UZENDOSKI AND WALDO A. VEzEAU, St. Louis 
University. (By title) 


It is the purpose of this work to generalize the existing theory of moments, characteristic 
functions, and moment-generating functions to multivariate cumulative distributions. 
Statistical independence and dependence served as a basis for further subdivision of the 
extension. 

For the independent case the moment theory consisted of a definition of a cumulative 
moment about the origin, as well as about z; = a; , a more general point than heretofore 
given. 


Unlike the earlier definition of a cumulative moment for the dependent case, the moment 





ABSTRACTS 837 


was defined in terms of marginal distributions. Thus the definitions for special functions 
to be used in the formulation of the moment definition were eliminated. Although the 
definitions of the moments for the two cases differ, they are composed of the same number 
of terms; and, if statistical independence is assumed in the dependent case, it reduces to 
the cumulative moment as defined for a function of n independent variables. Moreover, the 
relation between cumulative moments was found to be in accord with that for the inde- 
pendent case. 


Similarly, an extension was made of the theory of cumulative characteristic and moment- 
generating functions found in previous theoretical work. 





NEWS AND NOTICES 
Readers are invited to snbmit to the Secretary of The Institute news items of interest 
Personal Items 


Dr. Gilbert W. Beebc, statistician of the Division of Medical Sciences, N AS- 
NRC, is in Japan serving a two year tour as head of the Statistics Department 
of the Atomic Bomb Casualty Commission, a field agency of the NAS-NRC at 
APO 354, San Francisco, California. 

Albert Bowker has been appointed Dean of the Graduate Division, Stanford 
University. 

L. 8. Brenna has joined the Texaco Research Center, Beacon, New York. 

Benjamin Buchbinder, formerly with the Martin Company, has accepted a 
position as statistician in the Systems Engineering Department, Special Products 
Division, of the Burroughs Corporation Research Center, Paoli, Pennsylvania. 

Dr. Bernard J. Derwort has been appointed Associate Professor of Mathe- 
matics and Chairman of the Department at the College of St. Thomas in St. 
Paul, Minnesota. 

Ronald 8. Dick, former graduate student at Columbia University, Dept. of 
Math. Stat., is now a Reliability Engineer with Sperry Gyroscope Co., Great 
Neck, New York, and Night Instructor in Mathematics, Queens College, Flush- 
ing, New York. 

George E. Ferris is Statistical Staff Specialist, Post Cereals Division, General 
Foods Corporation, 275 Cliff Street, Battle Creek, Michigan. 

Raymond I. Fields was promoted to Associate Professor of Mathematics 
Speed Scientific School, University of Louisville, Louisville, Kentucky; also he is 
serving as Co-director of the University’s Computing Laboratory. 

Dr. J. Gani has taken leave of absence from the University of Western Au- 
tralia from February 1959, to spend a year as Visiting Associate Professor in the 
Department of Mathematical Statistics at Columbia University. 

Dr. Landis 8. Gephart, formerly Chief of the Statistics Branch and the Design 
of Experiment Office of the Office of Ordnance Research, Durham, North Caro- 
lina, has accepted the position of Scientific Advisor, European Research Office, 
Rheingau Alee 2, Frankfurt am Main, Germany. Dr. Gephart and his family 
are now residing in Frankfurt. Their APO address is: U. 8. Army R and D 
Liaison Group, 9851 DU, APO 757, New York, New York. 

Sudhish G. Ghurye is now an Associate Professor in the Department of Mathe- 
matics at Northwestern University, Evanston, Illinois. 

E. J. Gumbel has accepted an invitation to serve as Visiting Professor at the 
Free University of Berlin (West) during the summer term 1959. 

Mortimer B. Keats, formerly with the Missile Guidance Subsection, General 
Electric Company, Utica, New York, is now with the Knolls Atomic Power 
Laboratory, General Electric Company, Schenectady, New York. 

Professor Sixto Rios, Director of the Instituto de Investigaciones Estadisticas 


838 





NEWS AND NOTICES 839 


(Madrid) has been elected “Academico Numerario de la Real Academia de 
Ciencias de Madrid’. 


Herbert Solomon has been appointed Professor of Statistics and Executive 


Head of the Statistics Department at Stanford University, beginning Septem- 
ber, 1959. 


Sidney Weiner is presently employed as Senior Mathematical Statistician by 
ARINC Research Corp., Washington, D. C. 


Evan J. Williams, formerly Professor of Statistics at North Carolina State 
College, is now giving his full time to Moral Re-Armament. 


New Members 
The following persons have been elected to membership in the Institute 
February 7, 1959, to May 7, 1959 


Asai, Akira, M.Sc., (Nagoya U.), Faculty of Arts and Science, Lecturer of Statistics, Chiba 
University, 824 Konakadai-machi, Chiba-city, Japan; 603 Yoyogi-Hatsudat, Shibuya-ku, 
Tokyo, Japan. 

Bhatia, Sat Paul, M.A., (Punjab University), Student, University of North Carolina, Depart- 
ment of Statistics, Chapel Hill, North Carolina. 

Brenner, Cecil F., Ph.D., (Brooklyn Polytechnic Institute), Statistical Analyst, Johnson 
and Johnson, New Brunswick, New Jersey. 

Barral Courtis, Jose, Actuary, C.P.A., (University of Buenos Aires), Data Processing Di- 
vision, I.B.M. World Trade, Argentina; IBM World Trade Corp., Avda. Roque Saenz 
Pena 98, Buenos Aires, Argentina. 

Basmann, Robert L., Ph.D., (Iowa State College), Operations Research Analyst, Han- 
ford Laboratories Operation, Operations Research and Synthesis Operation, General 
Electric Company, Hanford Atomic Products Operation, Richland, Washington. 

Brillinger, David R., B.A., (University of Toronto), student, University of Toronto, 
Toronto, Ontario, Canada; 182 Ellerslie Ave., Willowdale, Ontario, Canada. 

Buehler, Robert J., Ph.D., (Univ. of Wisconsin), Assistant Professor of Statistics, Iowa 
State College, Ames, Iowa. 

Bush, Norman, M.B.A., (C.C.N.Y.), Student, University of North Carolina, Dept. of 
Statistics, Chapel Hill, N. C.; 506 Severin Street, Chapel Hill, N.C. 

Capon, Jack, M.S., (M.I.T.), Instructor, Columbia University, New York City, New 
York; 30-05 94 Street, East Elmhurst 69, New York. 

Carrick, Paul M., Jr., Ph.D., (University of California), Statistician, Convair Astro- 
nautics, A Division of General Dynamics, Inc., Kearny Mesa Road, San Diego, Cali- 
fornia; 6668 Mohawk St., San Diego 15, Calif. 

Chow, Gregory C., Ph.D., (Univ. of Chicago), Assistant Professor of Industrial Manage- 
ment, Mass. Inst. of Tech., 50 Memorial Drive, Cambridge 39, Mass. 

Christian, David B., B.A., (Univ. of Minn.), Mathematician, Bendix Products Div., Bendix 
Aviation Corp., 400 8. Beiger St., Mishawaka, Indiana; 60189 Bremen Hwy., R.R.2, 
Mishawaka, Indiana. 

Cobb, Whitfield, Ph.D., (U. of N. C.), Assistant Professor of Mathematics, Woman’s Col- 
lege of the University of N. C., Greensboro, N. C. 

Cruise, Sydney E., M.A. (Cantab.), Senior Lecturer in Statistics, University of Natal, 
Howard College, Durban, South Africa. 

Doehlert, David H., M.Ed., (Temple U.), Statistician, EZ. I. duPont deNemours and Co., 
Wilmington, Del. 

Dubey, Satya D., B.Sc., (Patna Univ.), Special Graduate Research Assistant, Dept. of 
Statistics, M.S.U., E. Lansing, Michigan. 





840 NEWS AND NOTICES 


Freedman, David A., B.Sc. (McGill University), Graduate Student, Princeton University, 
Princeton, N. J.; Graduate College, Princeton, N. J. 

Gillespie, Raymond H., M.S., (Purdue University), Head, Computing Laboratory, Metals 
Research Laboratory, Metals Research Laboratories of Union Carbide Metals Company, 
4625 Royal Avenue, Niagara Falls, N. Y. 

Green, Eric A., M.A., (Queen’s Univ.), Graduate Student, Research Assistant, Dept. of 
Stat., University of North Carolina, Chapel Hill, N. C.; 106 Connor, Chapel Hill, 
N.C. 

Irwin, Joseph Oscar, Sc.D., (Cambridge), D.Sc. (London), Visiting Professor, Dept. of 
Biostatistics, Univ. of North Carolina, Chapel Hill, N. C. 

Jessop, William N., B.Sc., (Imperial College), Head of Operational Research, Courtaulds, 
Lid., Coventry, England. 

Kattsoff, Louis O., Ph.D., (Univ. of Penn.), Professor, Harpur College, State University 
of New York, Endicott, New York; 907 Park Street, Endicott, New York. 

Lee, Chung Gul, B.S., (Pusan National Univ.), Assistant Professor, Pusan Teachers’ 
College, Kuche-dong, Pusan, Korea. 

Lewish, William Thomas, M.Sc., (N. C. State), Student, Iowa Siate College Statistical 
Laboratory, Ames, Iowa. 

Losee, Garrie J., B.A., (Math., Hofstra College), Math. Stat.; Instructor, Bureau of the 
Census, U.S.D.A. Graduate School, Washington 25, D. C.; 4006 48th Street, Bladens- 
burg, Md. 

Magistad, John Gilbert, M.S., (lowa State College), Graduate Assistant, N. C. State 
College, Raleigh, N. C.; 2483 Wesley Rd., Raleigh, N. C. 

Marshall, Albert W., Ph.D., (University of Washington), Acting Assistant Prof., Stanford 
University, Stanford, California. 

Moore, James R., A.B., (Univ. of North Carolina), Math. Statistician, Ballistic Research 
Laboratories, Aberdeen Proving Ground, Maryland; 610 Market St., Aberdeen, Mary- 
land. 

Novikoff, Albert B. J., Ph.D., (Stanford U.), Research Mathematician, Stanford Research 
Inst., Menlo Park, California. 

Orense, Marcelo M., M.S. (Univ. of the Philippines), Student, University of North Caro- 
lina, Chapel Hill, N. C.; 416 Winston Dormitory, University of North Carolina, Chapel 
Hill, N.C. 

Perchonok, Philip A., B.S., (Bradley U.), Analyst, Math. and Physics Division, Midwest 
Research Institute, 425 Volker Boulevard, Kansas City 10, Missouri. 

Powell, Charles C., Systems and Procedures Analyst, Richfield Oil Corporation, Los 
Angeles, California; 260 East California Boulevard, Pasadena, California. 

Rabbat, Michael, B.S. (Inst. of Tech.), Teacher of Mathematics, College De La Salle, 
Daher, Cairo, Egypt; 100, Osman EBN Affan, Heliopolis, Cairo, Egypt. 

Ryan, Marie Vida, B.A., (Univ. of California), Statistician, California Dept. of Correc- 
tions, State Office Bldg. No., 1, Room 502, Sacramento 14, California; 3551 12th Avenue, 
Sacramento 17, California. 

Schatzoff, Martin, M.A., (New York University), Manager, Reliability Analysis and Pre- 
diction IBM Corp., Product Development Laboratory, Endicott, New York; 38 Brook- 
side Ave., Apalachin, New York. 

Scheinok, Perry A., B.S., (City College of New York), Research Assistant, Indiana Uni- 
versity, Dept. of Mathematics, Bloomington, Indiana. 

Sen, Amode Ranjan, Ph.D., (U. of N. C.), Senior Statistician, Tocklai Experimental Sta- 
tion, Cennamara, P. O., Assam, India. 

Shartle, Richard B., Ch.E., B.A., (Purdue University, Miami Univ.), Statistical Coordi- 
nator, The Standard Register Company, 626 Albany Street, Dayton 1, Ohio. 

Sternberg, Shirley deBobes, M.S., (New York Univ.), Fellow and Lab. Assistant in Bio- 
metrics, Faculty of Medicine, School of Public Health, Columbia University, 600 W 
168 Street, New York, N. Y.; 655 East 14 St., New York 9, N. Y. 





NEWS AND NOTICES 841 


Thomas, Ronald Emerson, M.A., (Queen’s Univ.), Student, Univ. of North Carolina, 
Chapel Hill, N. C.; Dept. of Statistics, U. of North Carolina, Chapel Hill, N.C. 
Throckmorton, Thomas Neil, M.S., (Iowa State College), Student, Graduate Assistant, 

Statistical Laboratory, Iowa State College, Ames, Iowa. 
Trask, Richard K., B.S., (Manhattan College), Mathematical Statistician, Ballistic Re- 


search Laboratories, Aberdeen Proving Ground, Maryland; 402-F North Court Road, 
Aberdeen, Maryland. 


Truax, Harry Mack, M.S., (Univ. of Del.), Manager of Quality Control Section, Production 
Dept., Atlas Powder Co., Wilmington, Delaware. 

Vogel, Walter, Dr.rer.nat., (Univ. of Tubingen), Wissenschaftlicher Assistant, Mathe- 
matisches Institut der Universitat Tubingen, Wilhelmstrasse 7, Tubingen, Germany. 
Woll, John W., Jr., Ph.D., (Princeton University), Assistant Professor, University of 

California, Department of Mathematics, Berkeley 4, California. 

Yem, Edmund G. N., B.S., (U. of Cal.), Graduate Student, University of California; 1642 
Eighth Avenue, San Francisco 22, California. 

Yen, Elizabeth Hsi, M.A., (U. of Minn.) Research Assistant and Teaching Assistant, 
University of Minnesota, Minneapolis 14, Minn.; c/o Mr. James T. Yen, Dept. of Aero 
Engineering, University of Minn., Minneapolis 14, Minn. 

Zinger, Alexis, Ph.D., (Univ. of Montreal), Assistant Professor, Dept. of Mathematics, 
Center of Statistics, University of Montreal, P. O. Box 6128, Montreal, Quebec, Canada. 


New Institutional Members 


GENERAL ANALYsIs CorPoRATION, 11753 Wilshire Boulevard, Los Angeles 25, Cal. 


* Harry W. Jonnson, GENERAL ANALYsis Corp., 11753 Wilshire Blvd., Los Angeles 25, 
Cal. 


Space TecHNoLoey Lasporatorigs, Attn: STL Library, P. O. Box 9500, Los Angeles 45, 
Cal. 


* Dr. Rosert W. Rector, Space Technology Lab., Inc., P. O. Box 9500, Los Angeles 45, 
Cal. 


$$$ 


INTERNATIONAL JOURNAL OF ABSTRACTS, STATISTICAL THEORY 
AND METHOD 


The International Statistical Institute announces publication of a new abstract 
journal, International Journal of Abstracts; Statistical Theory and Method. The 
journal will appear quarterly (the first issue is dated July, 1959), and the annual 
subscription is $16.00 or £5. About 1,000 abstracts per year will be published. 

The aim of this new journal is to give complete coverage of papers in the 
field of statistical theory (including associated aspects of probability and other 
mathematical methods) and new contributions to statistical method as published 
after October 1, 1958. In the case of the following five journals, all contributions 
will be abstracted: Annals of Mathematical Statistics; Biometrika; Journal, Royal 
Statistical Society (Series B); Bulletin of Mathematical Statistics; Annals, Insti- 
tute of Statistical Mathematics. 

A further group of six journals will be abstracted on a virtually complete 
basis, as follows: Biometrics; Metrika; Metron; Sankhyd; Technometrics; Review, 
International Statistical Institute. 


* Denotes person to receive full prerogatives of membership. 





842 NEWS AND NOTICES 


There are about 200 other journals partly devoted to statistical theory and 
method from which the appropriate papers will be abstracted. Journals from 
allied fields which contribute an occasional paper will also be covered and the 
relevant papers abstracted. In addition to this vast array of journal literature, 
it is intended to include abstracts of the special collections of papers as pub- 
lished in reports of conferences, symposia and seminars together with the pub- 
lished reports of experiment and other research stations. 

The abstracts will be in the English language and in adopting an abstract of 
up to 400-500 words—the UNESCO recommendation for the “long” abstract 
service—the International Statistical Institute hopes to fulfill a long-standing 
need for a service of informative abstracts in this important field. The format 
and simple binding allows of alternative treatments by users of this journal: 

(a) Leave intact as a shelf periodical. 

(b) Split and file in page form according to the main sections of the classi- 
fication. 

(ec) Split and guillotine (single cut only) each page ready for pasting on stand- 
ard index cards or filing in standard loose-leaf binders for which the holes are 
punched. 

This journal is prepared by the following editorial organization under a 
General Editor in association with a Managing Editor for the American and 
Pacific Area. The General Editor works in conjunction with the Research 
Techniques Unit (London School of Economics) and the Managing Editor is 
on the staff of the Institute of Statistics (North Carolina State College). The 
International Statistical Institute gratefully acknowledges initial financial 
support for this journal from the National Science Foundation of America. 

General Editor, Dr. Wm. R. Buckland, % 55 Broadway, London, 8. W. 1, 
England. 

Managing Editor (America and Pacific), Prof. R. L. Anderson, North Caro- 
lina State College, Raleigh, N. C., U.S. A. 

Regional Editors: Eastern Europe—Prof. A. Rényi (Budapest); France and 
Switzerland—Prof. D. Dugué (Paris); Germany and Austria—Dr. J. Pfanzagl 
(Vienna); Holland and Belgium—Prof. D. van Dantzig (Amsterdam); India 
and Pakistan—Prof. C. R. Rao (Calcutta); Italy—Prof. B. Colombo (Venice) ; 
Middle East—Dr. H. M. Husein (Cairo); Scandinavia—Prof. H. O. A. Wold 
(Uppsala); Spain and Portugal—Prof. 8. Rios (Madrid); U.S.S.R.—Prof. B. V. 
Gnedenko (Kiev); United Kingdom and Ireland—Dr. Florence N. David (Lon- 
don); Australasia—Dr. J. Gani (W. Australia); Central and S. America—Prof. 
M. da Silva Rodrigues (Séo Paulo); Japan and China—Prof. K. Matusita 
(Tokyo); North America—Prof. R. L. Anderson (North Carolina). 


—_—_—_—S 


RESEARCH FELLOWSHIPS IN PSYCHOMETRICS 


The Educational Testing Service, Princeton, New Jersey, is offering for 1960— 
61 its thirteenth series of research fellowships in psychometrics leading to the 





NEWS AND NOTICES 843 


Ph.D. degree at Princeton University. Open to men who are acceptable to the 
Graduate School of the University, the two fellowships each carry a stipend of 
$2,650 a year and are normally renewable. Fellows will be engaged in part-time 
research in the general area of psychological measurement at the offices of the 
Educational Testing Service and will, in addition, carry a normal program of 
studies in the Graduate School. 

Suitable undergraduate preparation may consist either of a major in psy- 
chology with supporting work in mathematics, or a major in mathematics to- 
gether with some work in psychology. However, in choosing fellows, primary 
emphasis is given to superior scholastic attainment and research interests rather 
than to specific course preparation. 

The closing date for completing applications is January 1, 1960. Information 
and application blanks will be available about September 15 and may be ob- 
tained from: Director of Psychometric Fellowship Program, Educational Testing 
Service, 20 Nassau Street, Princeton, New Jersey. 


a 


NEW JOURNAL OF TRANSLATIONS 


The Department of Commerce, through its Office of Technical Services (OTS), 
has started a new journal, Technical Translations. 

This journal appears twice a month and contains abstracts of foreign scientific 
publications (including mathematics) for which translations (in many cases 
from the Russian) are available through OTS. 


Further details may be obtained from Office of Technical Services, U. 8. 
Department of Commerce, Washington 25, D. C. 


I 


PRELIMINARY ACTUARIAL EXAMINATION PRIZE AWARDS 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1959 Prelim- 
inary Actuarial Examination are as follows: 

First Prize of $200: Harold M. Stark, California Institute of Technology 

Additional Prizes of $100 each: Stephen L. Adler, Harvard University; 
William G. Brown, University of Toronto; Richard M. Dudley, Harvard Uni- 
versity; David D. Grossman, Harvard University; Bertrand I. Halperin, Har- 
vard University; Ralph E. Miller, Harvard University; Steven A. Orszag, 
Forest Hills (N.Y.) High School; Barry Wolk, University of Manitoba. 

The Society of Actuaries has authorized a similar set of nine prizes for the 
May 1960 Examination on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exam- 
inations: Part 1. Language Aptitude Examination. (Reading comprehension, 
meaning of words and word relationships, antonyms, and verbal reasoning.) 





844 NEWS AND NOTICES 


Part 2. General Mathematics Examination. (Algebra, trigonometry, coordinate 
geometry, differential and integral calculus.) Part 3. Special Mathematics 
Examination. (Probability and statistics.) 

The 1960 Preliminary Actuarial Examinations will be prepared by the Educa- 
tional Testing Service under the direction of a committee of actuaries and 
mathematicians and will be administered by the Society of Actuaries at centers 
throughout the United States and Canada on May 11, 1960 and on November 
16, 1960. The closing date for applications for the May Examinations is April 1, 
1960, while that for the November Examination is October 1, 1960. Further 
information concerning these Examinations can be obtained from The Society 
of Actuaries, 208 South LaSalle Street, Chicago 4, Illinois. 


rn 


PUBLICATIONS RECEIVED 


Anuario Estadistico de Espaitia, Aftto XXX III, Presidencia del Gobierno, Instituto Nacional 
de Estadistica, Ferraz 41, Madrid, Spain, 1958, 1382 pp. 

Armour Research Foundation of Illinois Institute of Technology, ‘‘Proceedings of the 
Fifth Annual Computer Applications Symposium,’’ October 29-30, 1958, 10 West 35th 
Street, Chicago 16, Illinois $3.00, 153 pp. 

Schlaifer, Robert, Probability and Statistics for Business Decisions, McGraw-Hill Book 
Company, Inc., 330 West 42nd Street, New York 36, N. Y., 1959, $11.50, 732 pp. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 27, No. 3 = July 1959 


FRANKLIN M. Fisuer: Generalization of the Rank and Order Conditions for Identifiability 

W. H. Gorman: Separable Utility Functions in the A; — Problem 

W. H. Gorman: The Empirical Implications of a Utility : A Comment 

T. P. Huu: An Analysis of the Distribution of Wages and Salaries i in Great nae 

Tuomas Marscuak: Centralization and Decentralization in Economic O 

ANDRE Natar anv C. Fourcgeaup: Consommation en Prix et Revenu Reels et Theorie des Choix 
Joan Rosinson: Letter to the Editor 

Rosert H. Srrorz: The Utility Tree: A Correction and Further Appraisal 

Hrrorvumi Uzawa: The Prices of Factors of Production in International Trade 

Puivip Woire: The Simplex Method for Quadratic Programming 


Boox Revizws 


Elements of Mathematical Biology (A. J. Lotka). Review by Herbert A. Simon. The Theory of Games and Linear 
Programming (8. Vajda). Review by W. W. Cooper. Nogyo No Dotai Bunseki (Dynamic Analysis of Agricul- 
ture, Kazushi Ohkawa). Review by James I. Nakamura. Fondements d’une Théorie Positive des Choiz Com- 

nt un Risgue et Critigues des Postulats et Aziomes de l’Ecole Americaine (Maurice Allais). Review by 

trick Suppes. Trade and Welfare (Meade). Review by Arnold C. Harberger. Measuring Business Changes: 
A Handbook of Sig:ificant Business Indicators. Review by. Victor Zarnowitz. National Income Analysis and 
Forecasting (Robert M. B: Biggs). Review by K. C. Kuhlo. Risk and Gambling. Review by Mosteller. Cahiers du 
Séminaire d’ Econométrie No. 4. Review by Walter D. Fisher. Théories contemporaines re (Bernard Biet). 
Review by K. E. Boulding. International Bibliography of Economics. Vols. III and IV. Review by Otto H. 
Ehrlich. Influence de la nationalisation sur la gestion des entreprises gubiious ening e Maillet-Chassagne) . 
Review by Walter Froehlich. Das Rechnungswesen im Dienste der Leitung WHenrik Virkkunen). Review by 
Eric Schiff. Wahrscheinlichkeitstheorie (H. Richter). Review by J. Wolfowitz. Study of Consumer ~ 
Review by Dr. Helga Schmucker. The Analysis of Multi ‘ime Series (Quenouille). Review by M. Rosen- 
blatt. Das Wahrheitsproblem und die Idee der Semantik: Hine Einfuhrung in die Theorien (A. Tarski and R. 
Carnap). Review by E. Fels. Theoretical Welfare Economics (J. de V. Graaff). Review by William J. Baumol 
Introduction to Mathematical Economics (D. W. Bushaw and R. W. Clower). Review by William J. Baumol. 
Economics as a Science (Papandreou). Review by 8. Sankar Sengupta. 


ANNOUNCEMENTS AND NOTES 





ESTADISTICA 


Journal of the Inter American Statistical Institute 
Volume XVII, No. 62 Contents Marzo 1959 


Address to the VI Session of the Committee on Improvement of National Statistics 


(COINS)—Discurso en la VI Sesién de la Comisién de Mejoramiento de las Esta- 
disticas Nacionales (COINS) “ .TuLto H. MonTENEGRO 


El método estadistico y la filosofia de la ciencia ‘(traduecién) ..Haro_p HorTe.iine 
La medida de la distribucién de la poblacién (traduccién)....OT1s DupLEyY DuNcAN 
C4lculos de inversiones en paises subdesarrollados: Un Juicio erftico (traduccién) 

WiiuiaMm I, ABRAHAM 
La férmula de un nuevo nimero indice (traduccién) ; » ones onan elite MO UME 


Estudio de las relaciones intel Ee 1947—Parte II es naa 
Duane Evans y Marvin HorrenBEeRG 
La estadistica epidemiolégica en Venezuela....... ....Darto Curren 
Special Feature: Comisién de Estadistica de las Naciones Unidas: Informe sobre el 
X Perfodo de Sesiones, 28 de abril—15 de mayo de 1958 


Legal Provisions. International Resolutions Relating to Statistics. 
Institute Affairs. Statistical News. Publications. 
Published quarterly Annual subscription price $3.00 (U. S.) 
INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 











JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Volume 54 September, 1959 Number 287 


Some Methodological Notes on the Deflation of Construction....................- .. Norman M. Kapian 


An Econometric Model for United States Agriculture....................-....+5: .Wiiu1am A. Cromarty 


A Note on the Relationship Between Earning Expectations and New Car Purchases.... Peter E. pg JANos1 





The Accuracy of Census Literacy Statistics in Iran..............ccccececccccecvereeeees Cuaries WINDLE 
Sources of Statistics on Crime and Correction................ceeceecccccceccceeeeenss Rownatp H. Beattie 
Publication Decisions and Tests of Significance—A Comment................cseeeeeeeees Gorpon TULLocK 
Some Finite Population Unbiased Ratio and Regression Estimators. ...............0.-0-0000+5 M. R. Mickey 
Confidence Intervals for the Means of Dependent Normally Distributed Variables........ Ourve Jean Dunn 
A Basis for the Selection of a Response Surface Design.............. G. E. P. Box anpj Norman R. Draper 
Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedure............4+-.eeeeee++ Joun W. Pratt 
Graphic Methods Based upon Properties of Advancing Centroids....................+.000: 8. I. Asxovirz 
Optimal Confidence Intervals for the Variance of a Normal Distribution R. F. Tate anv G. W. Kuerr 
Extended Tables of the Percentage Points of Student’s ¢-Distribution................ Enrico T, Fepgrien1 
Percentage Points for the Distribution of Outgoing Quality.................. G. P. Srecx anp D. B. Owen 


AMERICAN STATISTICAL ASSOCIATION 


BEACON BUILDING 
1757 K STREET, N. W. 
WASHINGTON 6, D. C. 


JOURNAL OF THE 


ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 





30s per part Vol. 21, No. 1, 1959 Ann. Sub. 
Incl. post 
£3 2s Od 

CONTENTS 


Geometric DisrrisuTions In THE THEORY oF Queves. By C. B. Winsten. (With Discussion). 

Benaviour Sequences as. Semi-Marxov Cuains. By Vioter R. Cane. (With Discussion). 

On A Mutrtivaniate VERSION oF FreLteR’s TuzoremM. By B. M. Bennett. 

Tue Process Curve AND THE EquivaLent Mixep BrinomiaAL with Two Components. By M. K. 
VAGHOLKAR. 

Tue INFORMATION IN AN Expermment. By C. L. MALiows. 

Tue Benrens-Fisuer DistrisvuTion AND WeI1GHTED Means. By G. 8S. James. 

Tue Estmation oF RELATIONSHIPS WITH AUTOCORRELATED RESIDUALS BY THE Use oF INSTRUMENTAL 
VariasLes. By J. D. Sarcan. 

On Some Prosiems or Macnine InterFeRENcE. By 8S. K. Nasr. 

On THE DistriBvTION or Various Sums oF SQUARES IN AN ANALYSIS OF VARIANCE TABLE FOR DIFFERENT 
CLASSIFICATIONS WITH CORRELATED AND Non-Homoceneovs Errors. By B. R. Buar. 

Contaaious Occupancy. By D. E. Barton anp F. N. Davin. 

Tue Luwitine Frequencies or INTEGERS WITH A GIVEN ParTiTIONAL CHARACTERISTIC. By W.F. Bopmer. 

Some Surpte Duration-DePenpeNt Srocuastic Processes. By A. MERCER. 

Cyrcurc Queves wits Feeppacx. By P. D. Fincn. 

A Sratisticat Turory or Remnants. By J. Arrcuison. 

BANDWIDTH AND Reso.vasimiry In Statistica, Spectra, ANAtysis. By Z. A. LOoMNICKI AND 
8. K. ZanembBa. 

On a Property or Incomp.etre Biocxs. By R. Moriey Jones. 

A RENEWAL Prosiem with BuLK ORDERING oF Components. By D. R. Cox. 

Tue Dispersion or A Number or Species. By D. E. Barton anv F. N. Davin. 

On a Discrimmnatory Prositem CoNNECTED wiTH THE Works oF Pato. By D. R. Cox anp 
L. Branpwoop. 

EXPERIMENTS With Mixtures. By M. H. QuenovmLue. 

A Dirrerent Loss FUNCTION FoR 1HE CHOICE BETWEEN Two Poputations. By Ruta J. Mavrice. 

CENSORED OBSERVATIONS IN RANDOMIZED Bock Experiments. By M. R. Samprorp anv J. Taytor. 

CoRRIGENDA: 
Ditvution Serres: A Sratistica, Test or Tecunique. By W. L. Stevens. 
Tue Reoresstion ANALYsis OF Binary Sequences. By D. R. Cox. 
EXPERIMENT WiTtH Mixtures. By H. ScHELLE 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 

















































THE PROCEEDINGS OF THE SYMPOSIA IN 
APPLIED MATHEMATICS 


These symposia were held under the auspices of the American Mathe- 
matical Society and other interested organizations. The Society itself 
published the first two volumes. The McGraw-Hill Book Company, Inc. 
published and sold Numbers 3 through 8. These six volumes have now 
been transferred to the American Mathematical Society by special 
arrangement with the McGraw-Hill Book Company, Inc. Orders should 
be placed through the Society. 


Members of the Society will henceforth be entitled to the usual 25% 
discount on all of the volumes. Heretofore this privilege was available 
to members only for the first two volumes. 


Vol. 1. Non-linear problems in mechanics of continua. 1949. 
vii + 219 pp. List price: $5.25 
Vol. 2. Electromagnetic theory. 1950. 
iii + 91 pp. List price: $3.00 
Vol. 3. Elasticity. 1950. 
vi + 233 pp. List price: $6.00 
Vol. 4. Fluid dynamics. 1953 
vi + 186 pp. List price: $7.00 
Vol. 5. Wave motion and vibration theory, 1954. 
vi + 169 pp. List price: $7.00 
Vol. 6. Numerical analysis. 1956. 
vi + 303 pp. List price: $9.75 
Vol. 7. Applied probability, 1957. 
v + 104 pp. List price: $5.00 
Vol. 8. Calculus of variations and its applications, 1958. 
v + 153 pp. List price: $7.50 


IN PREPARATION: 


Volume 9. Orbit theory. Editor R. E. Langer, Approx. $8.00. 
Volume 10. Combinatorial designs and analysis. Editor R. E. Bellman 


Order from 
[AMERICAN MATHEMATICAL SOCIETY 
190 Hope Street, Providence 6, R.I. 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 21, Parts 1 & 2, 1959 


CONTENTS—Benaat Anturopometnic Survey, 1945: A Statistica, Stupy 


Relation between stature and blood group among Indian Soldiers skedeseatamionseecvelis te Mae 


National Sample Survey Number Eleven: 
The Sample Survey of Manufacturing Industries 1949 and 1950 


National Sample Survey Number Twelve: 
A technical note on age grouping 


A partial order and its applications to probability theory os Le ...T. V. NaRAYANA 
Random processes in economic theory and analysis..... i ie : gs i P. A. P. Moran 
Expressions for the lower bound to confidence coefficients........ .. Sarpat Kumar Bansruge 
A pilot health survey in West Bengal—1955 8. J. Port, M. V. Raman, 8. Biswas anv B. Cuaxrasort 
On recall lapse in infant death reporting.... ...Rangan Kumar Som anv Nrrat Cuoanpra Day 


Annvat Susscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 
STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 

















P rm 
> . 
Ss 
io on 
5 
ee 4 A * 
a 
a 
5 
eg a 
ra 
i * 
rk a 
; ? 
i 
: - 4 
a : rs , 
R \ 
. ~ a 
a 
" 7 
i 
. . * i 


eee en ene 
A - 
os 
_ 
e 
a 
° 
. : 
- 
5 - 
oi se 
A 
a 
ry 
Fe a 
2 A 
D 
a « 
‘’ 7 S 
oe a 
Sac n 
' . 
a . 7 
n 
. 
a 
—— i 
’ » * 
° 
‘i xo 
ey A f 





o 





